从零开始写一个Javascript解析器

Question

从零开始写一个Javascript解析器

Opened this issue 7 years ago · 4 comments

axetroy commented 7 years ago

最近在研究 AST, 之前有一篇文章面试官: 你了解过 Babel 吗？写过 Babel 插件吗? 答: 没有。卒
为什么要去了解它? 因为懂得 AST 真的可以为所欲为

简单点说，使用 Javascript 运行Javascript代码。

这篇文章来告诉你，如何写一个最简单的解析器。

前言(如果你很清楚如何执行自定义 js 代码，请跳过)

在大家的认知中，有几种执行自定义脚本的方法？我们来列举一下：

Web

创建 script 脚本，并插入文档流

function runJavascriptCode(code) {
  const script = document.createElement("script");
  script.innerText = code;
  document.body.appendChild(script);
}

runJavascriptCode("alert('hello world')");

eval

无数人都在说，不要使用eval，虽然它可以执行自定义脚本

eval("alert('hello world')");

参考链接: Why is using the JavaScript eval function a bad idea?

setTimeout

setTimeout 同样能执行，不过会把相关的操作，推到下一个事件循环中执行

setTimeout("console.log('hello world')");
console.log("I should run first");

// 输出
// I should run first
// hello world'

new Function

new Function("alert('hello world')")();

参考链接: Are eval() and new Function() the same thing?

NodeJs

require

可以把 Javascript 代码写进一个 Js 文件，然后在其他文件 require 它，达到执行的效果。

NodeJs 会缓存模块，如果你执行 N 个这样的文件，可能会消耗很多内存. 需要执行完毕后，手动清除缓存。

Vm

const vm = require("vm");

const sandbox = {
  animal: "cat",
  count: 2
};

vm.runInNewContext('count += 1; name = "kitty"', sandbox);

以上方式，除了 Node 能优雅的执行以外，其他都不行，API 都需要依赖宿主环境。

解释器用途

在能任何执行 Javascript 的代码的平台，执行自定义代码。

比如小程序，屏蔽了以上执行自定义代码的途径

那就真的不能执行自定义代码了吗？

非也

工作原理

基于 AST(抽象语法树)，找到对应的对象/方法, 然后执行对应的表达式。

这怎么说的有点绕口呢，举个栗子console.log("hello world");

原理: 通过 AST 找到console对象，再找到它log函数，最后运行函数，参数为hello world

准备工具

Babylon, 用于解析代码，生成 AST
babel-types, 判断节点类型
astexplorer, 随时查看抽象语法树

开始撸代码

我们以运行console.log("hello world")为例

打开astexplorer，查看对应的 AST

由图中看到，我们要找到console.log("hello world")，必须要向下遍历节点的方式，经过File、Program、ExpressionStatement、CallExpression、MemberExpression节点，其中涉及到Identifier、StringLiteral节点

我们先定义visitors, visitors是对于不同节点的处理方式

const visitors = {
  File(){},
  Program(){},
  ExpressionStatement(){},
  CallExpression(){},
  MemberExpression(){},
  Identifier(){},
  StringLiteral(){}
};

再定义一个遍历节点的函数

/**
 * 遍历一个节点
 * @param {Node} node 节点对象
 * @param {*} scope 作用域
 */
function evaluate(node, scope) {
  const _evalute = visitors[node.type];
  // 如果该节点不存在处理函数，那么抛出错误
  if (!_evalute) {
    throw new Error(`Unknown visitors of ${node.type}`);
  }
  // 执行该节点对应的处理函数
  return _evalute(node, scope);
}

下面是对各个节点的处理实现

const babylon = require("babylon");
const types = require("babel-types");

const visitors = {
  File(node, scope) {
    evaluate(node.program, scope);
  },
  Program(program, scope) {
    for (const node of program.body) {
      evaluate(node, scope);
    }
  },
  ExpressionStatement(node, scope) {
    return evaluate(node.expression, scope);
  },
  CallExpression(node, scope) {
    // 获取调用者对象
    const func = evaluate(node.callee, scope);

    // 获取函数的参数
    const funcArguments = node.arguments.map(arg => evaluate(arg, scope));

    // 如果是获取属性的话: console.log
    if (types.isMemberExpression(node.callee)) {
      const object = evaluate(node.callee.object, scope);
      return func.apply(object, funcArguments);
    }
  },
  MemberExpression(node, scope) {
    const { object, property } = node;

    // �找到对应的属性名
    const propertyName = property.name;

    // 找对对应的对象
    const obj = evaluate(object, scope);

    // 获取对应的值
    const target = obj[propertyName];

    // 返回这个值，如果这个值是function的话，那么应该绑定上下文this
    return typeof target === "function" ? target.bind(obj) : target;
  },
  Identifier(node, scope) {
    // 获取变量的值
    return scope[node.name];
  },
  StringLiteral(node) {
    return node.value;
  }
};

function evaluate(node, scope) {
  const _evalute = visitors[node.type];
  if (!_evalute) {
    throw new Error(`Unknown visitors of ${node.type}`);
  }
  // 递归调用
  return _evalute(node, scope);
}

const code = "console.log('hello world')";

// 生成AST树
const ast = babylon.parse(code);

// 解析AST
// 需要传入执行上下文，否则找不到``console``对象
evaluate(ast, { console: console });

在 Nodejs 中运行试试看

$ node ./index.js
hello world

然后我们更改下运行的代码 const code = "console.log(Math.pow(2, 2))";

因为上下文没有Math对象，那么会得出这样的错误 TypeError: Cannot read property 'pow' of undefined

记得传入上下文evaluate(ast, {console, Math});

再运行，又得出一个错误Error: Unknown visitors of NumericLiteral

原来Math.pow(2, 2)中的 2，是数字字面量

节点是NumericLiteral, 但是在visitors中，我们却没有定义这个节点的处理方式.

那么我们就加上这么个节点:

NumericLiteral(node){
    return node.value;
  }

再次运行，就跟预期结果一致了

$ node ./index.js
4

到这里，已经实现了最最基本的函数调用了

进阶

既然是解释器，难道只能运行 hello world 吗？显然不是

我们来声明个变量吧

var name = "hello world";
console.log(name);

先看下 AST 结构

visitors中缺少VariableDeclaration和VariableDeclarator节点的处理，我们给加上

VariableDeclaration(node, scope) {
    const kind = node.kind;
    for (const declartor of node.declarations) {
      const {name} = declartor.id;
      const value = declartor.init
        ? evaluate(declartor.init, scope)
        : undefined;
      scope[name] = value;
    }
  },
  VariableDeclarator(node, scope) {
    scope[node.id.name] = evaluate(node.init, scope);
  }

运行下代码，已经打印出hello world

我们再来声明函数

function test() {
  var name = "hello world";
  console.log(name);
}
test();

根据上面的步骤，新增了几个节点

BlockStatement(block, scope) {
    for (const node of block.body) {
      // 执行代码块中的内容
      evaluate(node, scope);
    }
  },
  FunctionDeclaration(node, scope) {
    // 获取function
    const func = visitors.FunctionExpression(node, scope);

    // 在作用域中定义function
    scope[node.id.name] = func;
  },
  FunctionExpression(node, scope) {
    // 自己构造一个function
    const func = function() {
      // TODO: 获取函数的参数
      // 执行代码块中的内容
      evaluate(node.body, scope);
    };

    // 返回这个function
    return func;
  }

然后修改下CallExpression

// 如果是获取属性的话: console.log
if (types.isMemberExpression(node.callee)) {
  const object = evaluate(node.callee.object, scope);
  return func.apply(object, funcArguments);
} else if (types.isIdentifier(node.callee)) {
  // 新增
  func.apply(scope, funcArguments); // 新增
}

运行也能过打印出hello world

完整示例代码

其他

限于篇幅，我不会讲怎么处理所有的节点，以上已经讲解了基本的原理。

对于其他节点，你依旧可以这么来，其中需要注意的是: 上文中，作用域我统一用了一个 scope，没有父级/子级作用域之分

也就意味着这样的代码是可以运行的

var a = 1;
function test() {
  var b = 2;
}
test();
console.log(b); // 2

处理方法: 在递归 AST 树的时候，遇到一些会产生子作用域的节点，应该使用新的作用域，比如说function，for in等

最后

以上只是一个简单的模型，它连玩具都算不上，依旧有很多的坑。比如:

变量提升, 作用域应该有预解析阶段
作用域有很多问题
特定节点，必须嵌套在某节点下。比如 super()就必须在 Class 节点内，无论嵌套多少层
this 绑定
...

连续几个晚上的熬夜之后，我写了一个比较完善的库vm.js，基于jsjs修改而来，站在巨人的肩膀上。

与它不同的是:

重构了递归方式，解决了一些没法解决的问题
修复了多项 bug
添加了测试用例
支持 es6 以及其他语法糖

目前正在开发中, 等待更加完善之后，会发布第一个版本。

欢迎大佬们拍砖和 PR.

小程序今后变成大程序，业务代码通过 Websocket 推送过来执行，小程序源码只是一个空壳，想想都刺激.

项目地址: https://github.com/axetroy/vm.js

在线预览: http://axetroy.github.io/vm.js/

原文: http://axetroy.xyz/#/post/172

huangw1 commented 7 years ago

大赞

Answer 1 · 2018-03-13T07:19:11.000Z

记录下踩坑, 在写测试用例的时候遇到的

var a = (get() , 2);
var b;

function get(){
  b = 3;
}

module.exports = {a: a, b: b};

实际输出, a = 2, b = undefined;

原因

因为有预解析的作用，转换成这样的代码，就可以了

var a;
var b;
a = (get() , 2);
b = undefined;

function get(){
  b = 3;
}

module.exports = {a: a, b: b};

Answer 2 · 2018-05-17T01:04:34.000Z

你这博客怎么fork后搭建

Answer 3 · 2022-02-18T09:11:32.000Z

你好大佬最近在研究你的代码，想问一下，感觉 CallExpression 解析函数里 context = path.evaluate(path.createChild(node.callee.object)) 这个逻辑是不是多余了，我看MemberExpression里递归解析出来的func已经bind过上下文了,造成的问题就是可能有些情况内层的逻辑会被执行多次。比如下面这个case
new Promise((resolve) => { console.log("testing") resolve() }).then().then()