Vue 3源码深度解析之：`ast`抽象语法树：模板编译器的第一步：解析`HTML`。 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

各位同学，早上好！今天咱们要聊聊Vue 3 源码里一个非常关键的部分，那就是抽象语法树（AST）。别害怕这个名字，听起来唬人，其实它就是个数据结构，用来表示你的 HTML 模板。可以把它想象成是编译器理解你代码的第一步，就像人脑理解一句话之前的语法分析。

咱们这次的讲座主题是：“Vue 3源码深度解析之：ast抽象语法树：模板编译器的第一步：解析HTML。”

准备好了吗？ Let’s dive in!

一、为什么需要 AST？

首先，我们得搞明白为什么需要 AST。浏览器能直接渲染 HTML，这是没错。但是，Vue 编译器要做的事情可不仅仅是渲染，它需要：

理解你的模板结构: 知道哪些是元素，哪些是属性，哪些是文本，哪些是 Vue 指令。
进行优化: 例如，静态节点提升、事件侦听器缓存等等。
生成渲染函数: 将模板转化为 JavaScript 代码，最终生成虚拟 DOM。

直接对字符串进行操作，效率低且容易出错。 AST 提供了一个结构化的表示，让编译器能更容易地进行分析和转换。

二、 AST 的基本结构

AST 本质上是一个树形结构。每个节点代表 HTML 模板中的一个元素、属性、文本等等。常见的节点类型包括：

节点类型	描述	示例
`ELEMENT`	HTML 元素	`<div id="app"></div>`
`TEXT`	文本节点	"Hello World"
`ATTRIBUTE`	HTML 属性	`id="app"`
`DIRECTIVE`	Vue 指令 (例如：`v-if`, `v-for`)	`v-if="isShow"`
`INTERPOLATION`	插值表达式 (例如：`{{ message }}`)	`{{ message }}`
`COMMENT`	HTML 注释	`<!-- This is a comment -->`
`ROOT`	根节点，表示整个模板	整个模板的顶级节点

一个简单的例子：

<div id="app">
  {{ message }}
</div>

对应的 AST (简化版):

{
  type: 'ROOT',
  children: [
    {
      type: 'ELEMENT',
      tag: 'div',
      props: [
        {
          type: 'ATTRIBUTE',
          name: 'id',
          value: 'app'
        }
      ],
      children: [
        {
          type: 'INTERPOLATION',
          content: 'message'
        }
      ]
    }
  ]
}

可以看到，AST 用 JavaScript 对象来表示 HTML 的结构，通过 type 字段来区分不同的节点类型，并通过 children 字段来表示父子关系。

三、 Vue 3 的解析过程

Vue 3 的解析过程主要分为以下几个步骤：

tokenize (词法分析): 将 HTML 字符串分解成一个个的 token (令牌)。
parse (语法分析): 将 token 序列转换成 AST。

3. transform (转换): 对 AST 进行转换和优化，例如：静态节点提升、指令转换等等。

3. generate (代码生成): 将转换后的 AST 生成渲染函数。

今天我们重点关注前两个步骤：tokenize 和 parse。

3.1 Tokenize (词法分析)

词法分析器负责将 HTML 字符串分解成一个个的 token。 Token 可以理解为是 HTML 语法的最小单元。常见的 token 类型包括：

Token 类型	描述	示例
`START_TAG_OPEN`	标签开始的 `<` 符号	`<`
`TAG_NAME`	标签名称	`div`
`ATTRIBUTE_NAME`	属性名称	`id`
`ATTRIBUTE_VALUE`	属性值	`"app"`
`TEXT`	文本内容	`Hello World`
`END_TAG_OPEN`	标签结束的 `</` 符号	`</`
`CLOSE_TAG`	标签结束的 `>` 符号	`>`
`INTERPOLATION_START`	插值表达式开始的 `{{` 符号	`{{`
`INTERPOLATION_END`	插值表达式结束的 `}}` 符号	`}}`

下面是一个简单的 tokenize 函数的示例：

function tokenize(template) {
  const tokens = [];
  let cursor = 0;

  while (cursor < template.length) {
    const char = template[cursor];

    if (char === '<') {
      if (template[cursor + 1] === '/') {
        // 结束标签
        tokens.push({ type: 'END_TAG_OPEN', value: '</' });
        cursor += 2;

        let tagName = '';
        while (cursor < template.length && template[cursor] !== '>') {
          tagName += template[cursor];
          cursor++;
        }
        tokens.push({ type: 'TAG_NAME', value: tagName });
        tokens.push({ type: 'CLOSE_TAG', value: '>' });
        cursor++;

      } else {
        // 开始标签
        tokens.push({ type: 'START_TAG_OPEN', value: '<' });
        cursor++;

        let tagName = '';
        while (cursor < template.length && template[cursor] !== ' ' && template[cursor] !== '>') {
          tagName += template[cursor];
          cursor++;
        }
        tokens.push({ type: 'TAG_NAME', value: tagName });

        // 处理属性
        while (cursor < template.length && template[cursor] !== '>') {
          // 跳过空格
          if (template[cursor] === ' ') {
            cursor++;
            continue;
          }

          let attrName = '';
          while (cursor < template.length && template[cursor] !== '=' && template[cursor] !== ' ' && template[cursor] !== '>') {
            attrName += template[cursor];
            cursor++;
          }

          if (attrName) {
            tokens.push({ type: 'ATTRIBUTE_NAME', value: attrName });
          }

          if (template[cursor] === '=') {
            cursor++;
            let attrValue = '';
            if (template[cursor] === '"') {
              cursor++;
              while (cursor < template.length && template[cursor] !== '"') {
                attrValue += template[cursor];
                cursor++;
              }
              cursor++; // 跳过 "
            } else {
                // 简单处理没有引号的情况
                while (cursor < template.length && template[cursor] !== ' ' && template[cursor] !== '>') {
                  attrValue += template[cursor];
                  cursor++;
                }
            }

            tokens.push({ type: 'ATTRIBUTE_VALUE', value: attrValue });
          }

        }

        tokens.push({ type: 'CLOSE_TAG', value: '>' });
        cursor++;
      }
    } else if (char === '{' && template[cursor + 1] === '{') {
      // 插值表达式
      tokens.push({ type: 'INTERPOLATION_START', value: '{{' });
      cursor += 2;

      let content = '';
      while (cursor < template.length && template[cursor] !== '}' && template[cursor + 1] !== '}') {
        content += template[cursor];
        cursor++;
      }
      tokens.push({ type: 'TEXT', value: content.trim() }); //trim 去掉空格
      tokens.push({ type: 'INTERPOLATION_END', value: '}}' });
      cursor += 2;

    } else {
      // 文本节点
      let text = '';
      while (cursor < template.length && template[cursor] !== '<' && !(template[cursor] === '{' && template[cursor+1] === '{')) {
        text += template[cursor];
        cursor++;
      }
      tokens.push({ type: 'TEXT', value: text });
    }
  }

  return tokens;
}

// 示例
const template = `<div id="app">Hello {{ message }}</div>`;
const tokens = tokenize(template);
console.log(tokens);

这个 tokenize 函数只是一个简化的示例，实际的 Vue 3 词法分析器会处理更多的情况，例如：注释、DOCTYPE、CDATA 等等。

3.2 Parse (语法分析)

语法分析器负责将 token 序列转换成 AST。这是一个递归的过程，需要根据 HTML 的语法规则，逐步构建 AST。

下面是一个简单的 parse 函数的示例：

function parse(tokens) {
  let cursor = 0;
  const root = {
    type: 'ROOT',
    children: []
  };
  const stack = [root]; // 用栈来维护父子关系

  while (cursor < tokens.length) {
    const token = tokens[cursor];

    switch (token.type) {
      case 'START_TAG_OPEN':
        cursor++;
        const tagNameToken = tokens[cursor];
        cursor++;
        const elementNode = {
          type: 'ELEMENT',
          tag: tagNameToken.value,
          props: [],
          children: []
        };

        // 处理属性
        while (tokens[cursor].type === 'ATTRIBUTE_NAME') {
          const attrNameToken = tokens[cursor];
          cursor++;
          const attrValueToken = tokens[cursor];
          cursor++;

          elementNode.props.push({
            type: 'ATTRIBUTE',
            name: attrNameToken.value,
            value: attrValueToken.value
          });
        }
        stack[stack.length - 1].children.push(elementNode); // 将当前元素添加到父元素的 children 中
        stack.push(elementNode); // 将当前元素入栈，作为新的父元素
        break;

      case 'END_TAG_OPEN':
        cursor++;
        // 结束标签，将栈顶元素出栈
        stack.pop();
        cursor += 2; // 跳过 </ 和 >
        break;

      case 'TEXT':
        const textNode = {
          type: 'TEXT',
          content: token.value
        };
        stack[stack.length - 1].children.push(textNode);
        cursor++;
        break;

      case 'INTERPOLATION_START':
        cursor++;
        const contentToken = tokens[cursor];
        cursor++;
        const interpolationNode = {
          type: 'INTERPOLATION',
          content: contentToken.value
        };
        stack[stack.length - 1].children.push(interpolationNode);
        cursor += 2; // 跳过 }}
        break;

      default:
        cursor++;
    }
  }

  return root;
}

// 示例
const ast = parse(tokens);
console.log(JSON.stringify(ast, null, 2));

这个 parse 函数也只是一个简化的示例。实际的 Vue 3 语法分析器会处理更多的情况，例如：指令、表达式、错误处理等等。它使用了栈结构 stack 来维护父子关系，遇到开始标签就入栈，遇到结束标签就出栈。

四、 Vue 3 源码中的 AST

Vue 3 源码中的 AST 相关的代码主要在 packages/compiler-core 目录下。核心文件包括：

parse.ts: 包含词法分析和语法分析的实现。
ast.ts: 定义了 AST 节点的类型和接口。
transform.ts: 包含 AST 转换的实现。
generate.ts: 包含代码生成的实现。

Vue 3 的解析器使用了状态机模式，更加灵活和高效。并且， Vue 3 的 AST 节点包含更多的信息，方便后续的转换和优化。

五、总结

AST 是 Vue 编译器理解模板的关键。通过词法分析和语法分析，将 HTML 模板转换成一个结构化的数据表示，方便后续的转换、优化和代码生成。

今天的讲座只是一个简单的入门，实际的 Vue 3 源码更加复杂和精妙。希望通过今天的讲解，你能对 AST 有一个初步的了解，并能更深入地学习 Vue 3 的源码。

六、课后作业

尝试修改 tokenize 函数，使其能处理 HTML 注释。
尝试修改 parse 函数，使其能处理嵌套的 HTML 元素。
阅读 Vue 3 源码中 packages/compiler-core/parse.ts 文件的代码，了解 Vue 3 实际的解析过程。

好了，今天的讲座就到这里。祝大家学习愉快！下次再见！

发表回复 取消回复

发表回复取消回复