omphalos/thermite

optimizing map reads via event emission

moshewe opened this issue · 16 comments

Hi,

First of all, when I read about thermite my functional programming alter ego was in tears of joy. GREAT WORK!
Second: it says in the README that you check against the function map for function version - why not make the swappable code listen for change events on the function map and swap accordingly? This will save an if at runtime.

Haha, thanks for the kind words @moshewe :)

Regarding using notification-based hot swapping, the biggest issue with that, I think, is garbage collecting orphaned functions. We'd have to track each closure in a collection of some sort to be able to notify them when they need to be hot-swapped. Although, if this collection were a WeakMap, then I guess we would be protected from memory leaks.

There is a potential downside - hot swapping could take longer this way. For example, if I create 1,000 functions but only use 10, by hot swapping lazily saves a lot of unnecessary work. Still each time each function is called, it incurs the extra overhead of the map lookup and if check.

Maybe that's okay - hot swapping can be a less efficient but individual function calls would be faster. To me that sounds like a reasonable trade-off. I don't know - what do you think?

I think there's a different way to implement the notification based
swapping:
A map holds the CURRRENT implementation of each function (by name). When
you create a new swappable function, it registers in the map in an observer
pattern. When you want to swap the function, all you do is change the
function in the map and then notify all observers.
No IFs, no extra table lookup...

On Fri, Nov 20, 2015 at 4:10 AM, omphalos notifications@github.com wrote:

Haha, thanks for the kind words @moshewe https://github.com/moshewe :)

Regarding using notification-based hot swapping, the biggest issue with
that, I think, is garbage collecting orphaned functions. We'd have to track
each closure in a collection of some sort to be able to notify them when
they need to be hot-swapped. Although, if this collection were a WeakMap,
then I guess we would be protected from memory leaks.

There is a potential downside - hot swapping could take longer this way.
For example, if I create 1,000 functions but only use 10, by hot swapping
lazily saves a lot of unnecessary work. Still each time each function is
called, it incurs the extra overhead of the map lookup and if check.

Maybe that's okay - hot swapping can be a less efficient but individual
function calls would be faster. To me that sounds like a reasonable
trade-off. I don't know - what do you think?


Reply to this email directly or view it on GitHub
#3 (comment).

@moshewe I think we're describing the same thing :) I may be missing something though ...

In the general case this should be faster, but I can imagine a "pathological" case where it's much slower.

Something like:

var hotSwappableCode = thermite.eval('(' + function() {
  var functions = []
  for(var i = 0; i < 1000000; i++)
    functions.push(function() { return 1 })
  return functions
} + ')()')

hotSwappableCode.hotSwap('(' + function() {
  var functions = []
  for(var i = 0; i < 1000000; i++)
    functions.push(function() { return 2 })
  return functions
} + ')()')

In this case, even though there are no calls to the hot swapped functions, I'd expect significant overhead here if we implement hot swapping using an observable map. Whereas with the current implementation, functions are only hot-swapped on an as-needed basis.

I guess I'm leaning toward keeping things the way they since thermite is intended to be general-purpose, and accordingly it probably shouldn't fall victim to pathological cases.

Still, it might be worthwhile to come up with an experimental branch in the repo that implements hot swapping with an observable map, in case people want to experiment with it.

I fail to see how your example creates overhead. Why not hide the internal function map completely?

The clarity and performance justify conceding to SOME pathological cases IMO.

I fail to see how your example creates overhead. Why not hide the internal function map completely?

When using an observable map, the overhead is that each individual function reference is replaced at hot swap time, even though they aren't invoked. In the lazy case, only function references that are invoked are ever replaced. I don't see how hiding the internal function map affects this performance difference.

The clarity and performance justify conceding to SOME pathological cases IMO.

I'm not sure what you mean by clarify. I think it's worthwhile to come up with a branch that tests the performance difference though.

Replacing each reference, assuming hotswapping is significantly more rare than invocation, is more efficient IMO. It saves lots of IFs at times where you want speed.

I just might test performance in a separate branch sometime... :)

How about a function, that compiles the new proxied function? It's still lazy but no need to check if there's a new version available every call.

__fn = (function(code) {
  return function() {
    __fn = eval(code);
    return __fn.apply(this, arguments);
  };
}(code));

proxy function would still look the same

function __proxy() {
  return __fn.apply(this, arguments);
}

@caspervonb in the case you're describing, if the user of the library performs a hotSwap with new code, how does __fn know that it needs to recompile a new version?

with new code, how does __fn know that it needs to recompile a new version?

It doesn't, you would swap it with the "evaluator" while doing the traversal in hotSwap, instead of setting code an new version in the map just evaluate the function that will lazy evaluate the real function later, there and then.

e.g, this would be stuck in somewhere around https://github.com/omphalos/thermite/blob/master/thermite.js#L88 and replace updating the map code/version all together.

var code = /* ... */;

if (functionDidChange || functionIsNew) {
  var evaluateTheEvaluator = [
    __thermiteFunction_$thermiteBlockID = function() {
      __thermiteFunction_$thermiteBlockID = eval(code);
      __thermiteFunction_$thermiteBlockID.apply(this, arguments);
    };
  ].join('\n')
    .replace(/$thermiteBlockID/, blockID);
  }

  eval(evaluateTheEvaluator);
}

@caspervonb I think I see what you're saying. Currently the map tracks code versions. I do lazy evaluation because I the function itself to do eval with its version of eval to get all the variables in the function's scope. In the current version of __thermiteMap I don't have references to all the functions themselves in the map, just a reference, essentially, to the code that belongs in each function.

The way I understand @moshewe's suggestion is that we could keep each function (not just its code but a reference to each function, along with a reference to that function's eval) inside __themiteMap. Then when it comes time to hotswap, we could just iterate over the functions in __thermiteMap and update their code that way. (The only consideration would be that we need to use a WeakMap to prevent memory leaks when function references get orphaned.)

I think the biggest performance saving in the WeakMap version would be that we could avoid doing a version check on each function invokation.

If we're going into ES6 territory, we can use the new Map interface to store function-observer-wrappers by name - my experience shows that it's much faster than a regular object. WeakMap will also be a nice optimization, if we assume functions are not often removed and re-inserted into the table. Frequent remove/insert of functions will make a WeakMap's keys/memory build up and clear up repeatedly, which is time consuming.

@moshewe

If we're going into ES6 territory, we can use the new Map interface to store function-observer-wrappers by name - my experience shows that it's much faster than a regular object.

That's makes sense 👍

WeakMap will also be a nice optimization

I think it would have to be necessary to use WeakMap if we use eventing to hotswap. When we trigger an event, we have to have references to all event listeners so they can be notified of the event occurring. And in order to avoid memory leaks (when particular function references go out of scope) all these function references would have to be stored in a WeakMap so they could be garbage collected when they go out of scope.

With the current design (each function reference lazily upgrades itself) this doesn't happen, but if we moved away from that then I think it's not possible to do so without either WeakMap (unless we wanted to introduce memory leaks, which I would consider to be a bug).

Just FYI, I figured my previous train of thought earlier here would not work, as the closure could be lost, which it does.

var code = [
  'function() {',
  '  return value;',
  '}',
].join('\n');

var __function = function() {
  __function = eval('(' + code + ')');
  __function.apply(this, arguments);
};

var fn = function() {
  __function.apply(this, arguments);
};

var result = eval([
  'var value = 0;',
  'fn()',
].join('\n'));

console.assert(result == 0);

So back to to the drawing board, I'm curious to how you envision events working out here? what would the proxy look like?

@caspervonb @moshewe I was thinking about something like the following. However as I've researched it some more it seems that it's not possible to iterate over a WeakMap, therefore I don't think this method can work.

var __map = new WeakMap

var value = 3

// Create the updateable function
var proxyFn = (function() {

  // Create a unique identifier for our function
  // Each function proxy will have to have its own unique identifier
  var __id = {}

  __map.set(__id, {
    evaler: function(x) { return eval(x) }, // Save the reference to eval
    fn: function() { return value } // Create the initial function
  })

  // Delegate to the latest version of the function in the WeakMap
  return function() {
    return __map.get(__id).fn.apply(this, arguments)
  }
})()

// Instead of the proxy function lazily updating itself,
// Let's update all functions in the WeakMap
// (Note this is impossible because it turns out WeakMaps can't be iterated)
for(var entry of __map.values()) {
  entry.fn = entry.evaler('(function() { return value * value })')
}

// This prints 9
console.log(proxyFn())

We could use a Map here (or regular ES5 hash) to track the functions we want to update, but we can create memory leaks. For example:

setInterval(function() {
  var valueToLog = (function() { return 123 })()
  console.log(valueToLog)
}, 1000)

If this code were hotswapped, the function proxy would look something like this:

setInterval(function() {
  var valueToLog = (
    (function() {
      var __id = {}
      __map.set(__id, {
        evaler: function(x) { return eval(x) },
        fn: function() { return 123 }
      })
      return function() {
        __map.get(__id).fn.apply(this, arguments)
      }
    })()
  )()
  console.log(valueToLog)
}, 1000)

In this case every time the timer fires, a new entry would get created in __map and the entries would never get removed, creating a memory leak.

So since WeakMap is not iterable, it cannot work for this pattern, and since Map and regular ES5 hashes lead to memory leaks, we cannot use those either. Therefore the only way that I know how to implement the hotswapping functionality is with the current way that thermite does it - with function proxies updating their implementations lazily.

If you guys can convince me otherwise, I'll leave this issue open but currently I'm thinking we have to close it, since event emission is unworkable (WeakMap) or leads to memory leaks (Map and ES5 hash).

@moshewe @caspervonb I'm going to close this issue. In case either of you figure out how to make this workable we can probably reopen it and run a perf test comparing the two approaches.