overlookmotel/livepack

Contain `eval()` in a cage in instrumentation

overlookmotel opened this issue · 6 comments

There are many problems and annoying edge cases with eval(). Solving them all in output code is very tricky (#278), but should at least be possible to simplify handling it in instrumentation.

I made a good attempt to solve it in #137 (comment) but the implementation has become tortuously complex.

To prevent eval()'s effects leaking outside of it, and requiring workarounds all over the instrumentation code, it'd be better to try to contain its effects by running it in a global scope "cage" and injecting all external variables into that environment.

e.g. input:

const x = 123;
function f(y) {
  return eval('() => [x, y += 100, this, arguments]');
}
module.exports = f();

Instrumented:

const x = 123;
function f(y) {
  let livepack_temp1;
  return (
    livepack_temp1 = livepack_tracker.evalDirect(
      [ '() => [x, y += 100, this, arguments]' ],
      (eval, livepack_temp2) => eval(livepack_temp2),
      () => eval,
      v => eval = v
    ),
    livepack_temp1.eval && eval(livepack_temp1.eval),
    livepack_temp1.exec()
  );
}
module.exports = f();

evalDirect() would instrument the the eval-ed code as:

(_get, _set, livepack_tracker, livepack_getScopeId) => () => [
  _get('x'),
  _set('y', _get('y') + 100),
  _get('this'),
  _get('arguments')
]

evalDirect() would return an object with .eval and .exec properties.

.eval() is called first to create any bindings in outer scope due to var statements or sloppy function declarations in the eval-ed code (these can be determined statically).

.exec() is called 2nd.

  • It calls the instrumented function, injecting _get and _set.
  • The instrumented function can be called with indirect eval ((0, eval)(code)).
  • _get and _set use the 2rd argument to evalDirect() to execute get/set operations in the surrounding context.
  • _get and _set use the 3rd + 4th arguments if the code inside eval itself gets or sets an external var eval.
  • _get and _set can remove themselves from stack traces where there's an error.

Previous attempts to solve the problems of eval() have aimed to make minimal changes to the eval-ed code and use cunning wheezes to inject vars with with (...) {}, and other ploys to inject this, arguments, new.target and super.

The above suggestion takes a different approach. The changes to the eval-ed code are very invasive. However, as both the code inside and outside the eval() is statically analysed, we can know which variables the eval code accesses, so it's possible to translate it in full.

e.g.:

  • _set() can be passed a third argument for whether the position where _set() is called is strict or sloppy mode, so it can throw an error or fail silently accordingly if the assignment cannot succeed.
  • var statements in sloppy mode create bindings in the outer function. This can be determined statically and executed.
  • delete x can also be handled.

Despite the invasiveness, I actually think it's simpler overall than other attempted solutions, because all the actions are confined to within eval() itself.

Actually, there is one case I'm not sure how to handle.

function f() {
  eval('eval("var x = 123")');
  return x;
}

The inner eval() creates a binding in outer scope. But this can't be statically deduced before running the code.

Argh!

Maybe could solve by adding a with () {} to the outer function:

// Instrumented
function f() {
  const livepack_dynamicVars = Object.create(null);
  with (livepack_dynamicVars) {
    livepack_tracker.evalDirect(
      'eval("var x = 123")', ..., livepack_dynamicVars
    );
    return x;
  }
}

Any var declarations could be identified when the inner eval() is executed, and added as properties to livepack_dynamicVars (if eval-ed code fails to run with a syntax error, they could be removed again before they're observable elsewhere).

NB with() {} is only valid in sloppy mode, but that's fine because eval() in strict mode can't create external bindings anyway. So the with () {} can be omitted in strict mode.

Yes, actually above idea might be quite good.

Source (original example):

const x = 123;
function f(y) {
  return eval('() => [x, y += 100, this, arguments]');
}
module.exports = f();

Instrumented:

const x = 123;
function f(y) {
  const livepack_dynamicVars = Object.create(null),
    livepack_dynamicVarsEvalOnly = Object.create(null);
  with (livepack_dynamicVars) {
    return livepack_tracker.evalDirect(
      [ '() => [x, y += 100, this, arguments]' ],
      eval,
      {
        module: [ () => module, v => module = v ],
        exports: [ () => exports, v => exports = v ],
        require: [ () => require, v => require = v ],
        x: () => x,
        f: [ () => f, v => f = v ],
        y: [ () => y, v => y = v ],
        this: () => this,
        arguments: [ () => arguments, v => arguments = v ],
        'new.target': () => new.target
      },
      livepack_dynamicVars,
      livepack_dynamicVarsEvalOnly
    );
  }
}
module.exports = f();

There are no direct eval() calls left at all.

livepack_dynamicVars can also be used in the tracker when serializing functions to determine if the function uses any of these dynamically declared vars.

This could even handle vars created with names which clash with Livepack's internal vars in outer scope e.g. var livepack_tracker. Any such vars can only be accessed within eval()s because if they appeared in the outer code, Livepack's vars would have been named with a higher prefix num so as not to shadow them. So any such vars can be stored in a 2nd object which is only used for eval(), and not included in the with() {} context. This is what livepack_dynamicVarsEvalOnly is for.

Would also integrate nicely with support for with () {} (#136 (comment)).

In addition to _get and _set, a 3rd function _delete could be used to delete dynamic bindings. i.e. delete x would be replaced with _delete('x').

Function declarations also behave the same as var declarations in sloppy mode eval (including where they're hoisted from an nested block).

(0, eval)('function a() {}');
assert(typeof global.a === 'function');

(0, eval)('{ function b() {} }');
assert(typeof global.b === 'function');

(() => {
  eval('function c() {}');
  assert(typeof c === 'function');
  assert(typeof global.c === 'undefined');

  eval('{ function d() {} }');
  assert(typeof d === 'function');
  assert(typeof global.d === 'undefined');
})();

For indirect eval, need to wrap eval-ed code in a function to inject Livepack's internal vars and then execute the eval-ed code within a direct eval() call to get its completion value.

If eval-ed code is: var x = 123; 456, the generated code defining this function would be:

(livepack_tracker, livepack_getScopeId, livepack_temp1, livepack_temp2) => {
  with (livepack_temp1) {
    return eval(`
      livepack_temp2('var x');
      x = 123;
      456
    `);
  }
}

Function would be called with:

const nativeEval = global.eval; // Before it's shimmed
const withObj = {
  __proto__: null,
  get eval() {
    delete this.eval;
    Object.freeze(this);
    return nativeEval;
  }
};
f(tracker, getScopeId, withObj, nativeEval);

The with () {} block serves to temporarily provide the native eval function for the nested eval() call, and then destroys it before the eval-ed code executes, leaving the eval-ed code to run in an unaltered global context, with eval pointing to global.eval (which is shimmed).

Object.freeze(this) is not necessary, but may make it easier for the JS engine to optimize away the extra property lookups required by with () {}.

var x is run in a separate indirect eval context, to create binding for x in global scope. Would also need to deal with function declarations similarly.

livepack_temp2('var x') is inside the nested direct eval() call so it will not execute if the eval-ed code fails to run due to a syntax error.

Just realised, the "cage" approach will prevent super working in direct eval(). Would require transpiling super or passing in functions which execute super(), super.foo, super.foo(), super.foo = x etc.