nodejs/promises

AsyncWrap

petkaantonov opened this issue ยท 24 comments

Native promises don't have hooks in V8 so interop with AsyncWrap is non-trivial.

  • Should it block the PR?
  • Should it block unflagging?
  • Should it be hacked in, patched in v8 or making our own floating patch etc?

@trevnorris

From prior discussions the direction we were heading with this was:

  • It does not block the PR from landing, but from being unflagged.
  • It blocks AsyncWrap from being exposed (since native Promises anywhere in a dep tree break AsyncWrap).
  • We would try to expose the microtaskqueue to Node so that we can instrument it properly. This should land upstream in V8, and Node would float the patch.

/cc'ing @ofrobots as well, as I've pinged him to take a look at the proposed approach.

I'm labeling this as "blocks unflagging" for now, but will change it if folks object.

๐Ÿ‘ @chrisdickinson this sounds reasonable.

It blocks AsyncWrap from being exposed

Is this to say that making AsyncWrap public API is blocked because we don't have the API available to make it work correctly with the MicrotaskQueue?

@trevnorris That was my understanding, yes โ€” since a public AsyncWrap API user wouldn't be able to rely on the ability to back-associate asynchronous events in the presence of native Promise use in their extended dependency chain without modifications to the microtask queue โ€” Object.observe would also break AsyncWrap's contract because of the MicrotaskQueue.

@chrisdickinson I'd never planned on holding off making AsyncWrap public on points we can't control. With the work you and others have done to fix this there may be a solution in the near to mid term, but unlike timers and nextTick (which are also not currently supported but on me to implement) there isn't a directly supported API I can use.

This being said, it's not like AsyncWrap would go public tomorrow, or even next month, but I have a feeling that there will always be something in the pipe that won't work properly with AsyncWrap. (e.g. if ever those strange Promise based streams in muddy black magic come to node)

@trevnorris Understandable. Would it be fair to say that ideally we'd have a resolution for MicrotaskQueue control before exposing AsyncWrap? (Notably: having a solution for the microtask queue would solve the problem of WHATWG streams as well, since those are based on top of Promises.)

Just as an FYI, I've a PR open against AndreasMadsen's "async-hook" wrapper for a Promise patch (AndreasMadsen/async-hook#2). It's obviously incredibly lacking, but was how I attempted to tackle it with what's available.

Very cool to see the microtask work happening.

@chrisdickinson custom microtask queue wouldn't be enough if we want to capture context on then call.

@chrisdickinson

Would it be fair to say that ideally we'd have a resolution for MicrotaskQueue control before exposing AsyncWrap?

That would be ideal, and would also address the core API gap of queue'ing native callbacks in a nextTick way (i.e. void EnqueueMicrotask(MicrotaskCallback microtask, void* data = NULL);).

Aside on this note, how would AsyncWrap work with the MicrotaskQueue executor from the native side? I believe some enhancements will need to be placed in AsyncWrap, and am will to work that out once this moves forward.

@vkurchatkin At this point I'm not sure we must do that. .then handlers added synchronously to Node Promise API calls should automatically run in the parent's domain:

d = domain.create()
d.run(() => {
  return fs.readFilePromise()
}).then(data => {
  // runs in d, readFilePromise's executor was run in d, queued `readFile` in d, microtask
  // enqueued by callback of `readFile`
  return fs.existsPromise() // executor + callback are bound to d, resolution runs in d
    .then(() => {
      // still bound to d, since the microtask is queued inside of `existsPromise`
    })
}).then(otherData => {
  // runs in d, since resolution of dependent promise of `existsPromise` is
  // run in d and enqueues microtask there
})

// escapes:
const p = d.run(() => {
  return fs.readFilePromise()
})

setTimeout(() => {
  p.then(() => {
    // not run in d; microtask enqueued by "p.then"
  })
}, 500)

Only handlers added after the resolution of a Node core-returned Promise would pick up the domain of their enclosing scope. While this is probably going to be a bit difficult to document, I believe it should be workable behavior.

/cc @nodejs/tracing as I completely missed this myself.

@gsathya Sorry for the delay. While hiding from the internet and focusing on the implementation realized my previous ask was incorrect, and what's actually needed is simpler. Hopefully you agree.

Hi @trevnorris, commenting here as to not clutter the V8 issue. I believe the way you're describing the integration of Promises differs from at least I'd like to see (justification follows). Maybe it's a limitation with how microtasks are scheduled, but figured I should at least say something for there to be any chance of change.

Per your example:

setTimeout(function timeout01() {
  const p = new Promise(function promise01(res) {
    setTimeout(function timeout02() {
      res('foo');
    });
  });

  p.then(function then01(val) {
    return 'bar';
  });

  p.then(function then02(val) {
    return 'baz';
  });
});
- timeout01 - init

- timeout01 - before
- timeout02 - init
- timeout01 - after

- timeout02 - before
- res       - init
- timeout02 - after

- res       - before
- (call then01)
- res       - after
- res       - before
- (call then02)
- res       - after

From here we could say any resolved promise is a handle, which runs for each PromiseReactionJob. And thus if we scheduled a reaction later, we would still find ourselves with that resolved promise as the parent.

setTimeout(function timeout03() {
  p.then(function then03(val) {
    return 'quux';
  });
}, 3600 * 1000);
- timeout03 - before
- timeout03 - after

- res       - before
- (call then03)
- res       - after

The problem I find with this is when it comes to things such as request tracking in an APM / async context ala CLS. If you ever share a Promise between two requests, you jump contexts. Here's a simple example (where I make assumptions about a working version of nodejs/node#5573 which I haven't gotten back to - sorry):

const http = require('http');

let promiseToCachedThing = null;
function getThingAndCacheItAsPromise() {
  if (promiseToCachedThing === null) {
    promiseToCachedThing = new Promise(function promise01(resolvePromise01) {
      setTimeout(function timeout01() {
        resolvePromise01(Buffer.from('howdy!', 'utf8'));
      });
    });
  }

  return promiseToCachedThing;
}

http.createServer(function requestHandler(req, res) {
  getThingAndCacheItAsPromise()
    .then(function returnCachedThing(thing) {
      res.statusCode = 200;
      res.end(thing);
    }):
}).listen(0);
- request01        - init - #1
- request01        - before
- (call requestHandler)
- timeout01        - init - #2 (parent #1)
- request01        - after

- timeout01        - before
- resolvePromise01 - init - #3 (parent #2)
- timeout01        - after

- resolvePromise01 - before
- (call returnCachedThing) - (executing in #3)
- resolvePromise01 - after

- request02        - init - #4
- request02        - before
- (call requestHandler)
- request02        - after

- resolvePromise01 - before
- (call returnCachedThing) - (executing in #3)
- resolvePromise01 - after

In this case we've completely lost our context of request02. If we go back to our original example and change the timing to be like this instead:

- timeout01 - init

- timeout01 - before
- timeout02 - init
- then01    - init
- then02    - init
- timeout01 - after

- timeout02 - before
- timeout02 - after

- then01    - before
- then01    - after
- then02    - before
- then02    - after

Then our multi-request example ends up looking like so:

- request01            - init - #1
- request01            - before
- (call requestHandler)
- timeout01            - init - #2 (parent #1)
- returnCachedThing[1] - init - #3 (parent #1)
- request01            - after

- timeout01            - before
- timeout01            - after

- returnCachedThing[1] - before
- (call returnCachedThing[1]) - (executing in #3)
- returnCachedThing[1] - after

- request02            - init - #4
- request02            - before
- (call requestHandler)
- returnCachedThing[2] - init - #5 (parent #4)
- request02            - after

- returnCachedThing[2] - before
- (call returnCachedThing[2]) - (executing in #5)
- returnCachedThing[2] - after

I hope that makes sense. Do you have thoughts on that? Do you think it would be a possibility?

Thanks

@omsmith Thanks for the feedback. While revising my model of how Promises should work with async hooks I decided to model it on how callbacks, and events, are currently evaluated. While doing so I realized that it may need to be better communicated that AsyncWrap exists to track handle ownership. e.g. how did this handle come to exist?

Let's take a quick look at node's API as an reference:

let server = null;

setTimeout(function timer01() {
  server = net.createServer().listen(8080);
}, 10);

setTimeout(function timer02() {
  server.on('connection', c => {
    print('connection made');
  });
}, 100);

Both timer01 and timer02 will have their own unique id. server will have the parentId of timer01(). Now when the 'connection' event fires a TCPWrap is created. Thus triggering init() to be called. Would you expect the parentId passed to the init() of that new connection to be that of the server or timer02?

Thus far the API has evaluated the parentId as the handle responsible for the new class' construction. In this case that would be server. Before we proceed to discuss Promises, is that how you'd expect async hooks to work?

I would definitely expect the parent to be server in that example. I like exactly what you said: Thus far the API has evaluated the parentId as the handle responsible for the new class' construction.

I suppose what I'm asking is for a PromiseReaction to also be a handle which is created when then is called. The resolution of a promise (- res - init) may still make sense as it's own handle as well, which might pre/post around the PromiseReactionJob.

@omsmith Thanks. The first time I spec'd how Promises should interact with async_hooks it was very similar to what you described. Though as I was finishing up implementing the embedder API I realized there was an issue with maintaining an asynchronous stack trace.

Example code:

'use strict';

const crypto = require('crypto');
let p = null;

setTimeout(function sT01() {
  p = new Promise(function executor(res) {
    crypto.randomBytes(1024, function bytes(data) {
      res(data);
    });
  });
}, Math.random() * 60 * 60 * 1000);  // max of 1 min

setInterval(function sI02() {
  p.then(function thenFn(val) {
    console.log(val.toString('hex'));
  });
  clearInterval(this);
}, 10 * 1000); // check every 10 seconds

Make the assumption that we're recording the stack trace every time init() is called. If we follow the proposal that init() should run on the call to .then(), the async call stack when thenFn() is called would be:

thenFn()
sI02()
(global)

Can see that there's no record of the call to randomBytes() or other calls. But if init() is called on res() the stack would be:

res()
bytes()
executor()
sT01()
(global)

The rational here is that .then() functionally operates much like .on() as far as the user's concerned. Main difference with how .on() is traditionally used is that resolved values hang around for the lifetime of the Promise so that any previous or future call to .then() receives that value. Whereas event emitters usually call all event callbacks at the time of emit, but don't bother after.

Though I label this event emitter behavior an implementation detail. The following is not pretty, but does demonstrate a simple event emitter that behaves in a similar way to a .then():

'use strict';

function MyEmitter() {
  // Use Map so the event can be anything.
  this.listeners = new Map();
  this.events = new Map();
}

MyEmitter.prototype.on = function on(event, listener) {
  if (!this.listeners.has(event))
    this.listeners.set(event, [listener]);
  else
    this.listeners.get(event).push(listener);
  // Now check if events already exist for this listener
  if (this.events.has(event)) {
    for (let value of this.events.get(event))
      listener.call(this, value);
  }
};

MyEmitter.prototype.emit = function emit(event, value) {
  // Call previous listeners if they exist for the event.
  if (this.listeners.has(event)) {
    const listeners = this.listeners.get(event);
    for (let listener of listeners)
      listener.call(this, value);
  }
  // Then add the event to the event map.
  if (this.events.has(event)) this.events.get(event).push(value);
  else this.events.set(event, [value]);
};


const me = new MyEmitter();

me.on('foo', function onFoo(value) { console.log('onFoo', value) });
me.emit('bar', 'emit bar');

me.on('bar', function onBar(value) { console.log('onBar', value) });
me.emit('foo', 'emit foo');

setTimeout(() => {
  me.on('foo', val => console.log('foo timeout', val));
  me.on('bar', val => console.log('bar timeout', val));
}, 1000);

Of course another difference is that a Promise executor is only ever resolved once. So it's impossible that two .then() calls have different values, but hopefully we can look over that.

Anyway, to recap. The short of it is that I view .then() calls equivalent to .on() calls, and by doing so we're able to maintain the proper stack. init() is called when the Promise handler returns. Since that value propagates to the next chained .then(). So using this method we can successfully see all linked .then() calls.

Thoughts?

Qard commented

Another way to put it is that the linkage async_wrap is intended to provide is the edges of the synchronous JS execution barriers. The init should occur at the closest point to which the execution leaves JS to create an underlying async task of some variety and the before and after should generally closely map to the start and end of a complete JS tick. The only exceptions to that rule being where the embedder API has been used to capture continuation around any sort of user-mode queuing or connection pooling behaviour, timers for example.

By pushing the event trigger timing as close as possible to the async edges, it ensures the most complete encapsulation of the JS execution within a given context.

If you have 10 steps of process for every purchase request in a company, who's responsible for the work that's created from that - the one submitting the purchase request, or the one who made the step of process? :)

I could see .on() and .then() going either way, and both make sense to me, although the current direction obviously better suits async_wrap's primary goals. Just unfortunate it won't be everything for everyone, ha. I'd love this to work to provide a request / transaction context, which it certainly still may be able to. I suppose one would just have to avoid a pattern of Promise reuse across requests (and in dependencies). At least that's not a huge lift.

I appreciate the insightful and very respectful discussion @trevnorris and @Qard. It's especially nice to know you at least had the same train of thought at some point!

Qard commented

Transaction tracing is definitely and important purpose for async_wrap. We're trying to keep the core API surface of that use very minimal though and leave it up to userland to implement the rest. It's not too difficult to do, once you have a working continuation-local-storage implementation. I'm sure once async_wrap lands officially, there'll be lots of work going into improving the userland tracing story. ๐Ÿ˜ธ

async_hooks now has promise support, see PR: nodejs/node#13000

hayes commented

It looks like something changed between when this issue was last discussed here, and what actually ended up in 8.0.0.

Here is an example:

const async_hooks = require('async_hooks');
const fs = require('fs')

let indent = 0;
async_hooks.createHook({
  init(asyncId, type, triggerId, obj) {
    const cId = async_hooks.currentId();
    fs.writeSync(1, ' '.repeat(indent) +
                    `${type}(${asyncId}): trigger: ${triggerId} scope: ${cId}\n`);
  },
  before(asyncId) {
    fs.writeSync(1, ' '.repeat(indent) + `before:  ${asyncId}\n`);
    indent += 2;
  },
  after(asyncId) {
    indent -= 2;
    fs.writeSync(1, ' '.repeat(indent) + `after:   ${asyncId}\n`);
  },
  destroy(asyncId) {
    fs.writeSync(1, ' '.repeat(indent) + `destroy: ${asyncId}\n`);
  },
}).enable();

new Promise(function (resolve) {
  setTimeout(function () {
    fs.writeSync(1, ' '.repeat(indent) + `current id at resolve: ${async_hooks.currentId()}\n`);
    fs.writeSync(1, ' '.repeat(indent) + `trigger id at resolve: ${async_hooks.triggerId()}\n`);
    resolve();
  });
}).then(() => {
  async_hooks.triggerId()
  fs.writeSync(1, ' '.repeat(indent) + `current id in then: ${async_hooks.currentId()}\n`);
  fs.writeSync(1, ' '.repeat(indent) + `trigger id in then: ${async_hooks.triggerId()}\n`);
});

which prints:

PROMISE(2): trigger: 1 scope: 1
Timeout(3): trigger: 1 scope: 1
TIMERWRAP(4): trigger: 1 scope: 1
PROMISE(5): trigger: 1 scope: 1
before:  4
  before:  3
    current id at resolve: 3
    trigger id at resolve: 1
  after:   3
after:   4
before:  5
  current id in then: 0
  trigger id in then: 0
after:   5
destroy: 3

Given this, I don't see any way to link the context at the time of resolve (3) to the execution of the then callback (5). I would have expected trigger id during the callback passed to then (5) to be 3

Isn't this what nodejs/node#13367 is about?

hayes commented

Yes, sorry. I missed it initially, because I think the ticket/PR was initially about something different, but looking through it, it looks like the above use cases gets touched on further down in the comments.