puzpuzpuz/cls-rtracer

Set ID manually

SimenB opened this issue ยท 15 comments

Our app does some periodic polling, which is started by a setInterval and not an external call to our app. This call is missing a trace id. Would it be possible for us to manually set an ID before kicking of the async calls?

Or code is essentially this

export async function setupBroadcastPolling(client: SomeClient) {
  async function loadMessages() {
    try {
      const data = await loadBroadcastMessages(client);

      if (data) {
        broadcastMessages = data.messages;
      }
    } catch (error) {
      logger.error({ error }, 'Unable to poll broadcast messages');
    }
  }

  // reload every minute
  setInterval(loadMessages, 60 * 1000).unref();

  await loadMessages();
}

It would be great if we could have an ID in our logs in this case as well. In this case I'd love to somehow set an ID if it's missing within the loadMessages function. Our client does a bunch of logging as it works, which are all missing the request ID

Would it be possible for us to manually set an ID before kicking of the async calls?

@SimenB
This sound a bit orthogonal with the main use case of cls-rtracer, as these async calls seem to happen outside of requests (and web framework, in general).

Do you have multiple of such polling tasks? If it's only a single one per node.js app, why do you need to have an ID assigned to it? I'd like to understand the use case better before suggesting anything.

Our use case for these IDs in general is to use them as trace IDs in logs. We pass them on to the backend for every call as well, and every log statement in both the node app and the backend logs it out. That way we can trace a call through the whole stack. The client also passes it (and is picked up by the middleware provided by this module), so we can associate client request with db queries all the way through.

But since we cannot set the the id for non-request bound async work, we cannot trace that "request" through the system

So, something like the following API would work for you?

rTracer.runWithId(() => {
  const id = rTracer.id() // a new id will be available here
})

And the function will be runWithId(fn[, id]), where id is an optional override for the id available in the function.

If accessing rTracer.id() inside any async functions called by rTracer.runWithId() then yeah, that'd work ๐Ÿ™‚

@SimenB then #31 should add the feature you need. ๐Ÿ˜‰

Perfect, thanks! I'll test it right away

I've just published v2.1.0 which includes this feature. Let me know if you face any issues.

It also includes the Fastify plugin.

I forgot to report back - this worked wonderfully! Thank you so much for being so responsive and helpful here, and the work you and others do on ALS!

@SimenB I'm glad to hear that the feature works fine. Thanks for your feedback!

@puzpuzpuz I've found one issue with this. For some reason, the very first request a client makes doesn't get an ID. I haven't dug into it (tried reproducing in a bare bones app without success), but I just noticed that those requests have been getting a trace id recently, but that ID is the same one as the one provided during boot. We have essentially this code

import rTracer from 'cls-rtracer';
import start from '@private/express-starter';
import setupApp from './server';

async function main() {
  await rTracer.runWithId(async () => {
    const app = await setupApp();
    await start(app);
  });
}

This gives a trace ID for all logging during startup, which is very nice. However, this ID is kept even after the function completes, and used in later requests which do not have an ID.

Any idea about what's going on? While the issue would probably go away if I figured out why the first request is missing an ID, shouldn't the old one be "garbage collected" when the function provided to runWithId completes?

@SimenB hmm, sounds weird. I'll take a look at this issue later today.

Thanks! I'm having troubles reproducing this locally, it seems to only happen when deployed, for some reason... Will keep on trying to get a reproduction locally

@SimenB do you use the same node.js version locally and in production? Which one is that?

12.18.0 in prod, 12.18.1 locally. It works consistently locally with both 12.17.0 and 12.18.1. Hoping it's not something weird with the load balancers in GCP or something...

Any idea about what's going on? While the issue would probably go away if I figured out why the first request is missing an ID, shouldn't the old one be "garbage collected" when the function provided to runWithId completes?

@SimenB the ID will be available only within the function passed into runWithId() method. But if somehow the request is being handled within async call chain that corresponds to the function (which is weird), then the ID may be available there. But it sounds really weird to me and I'd appreciate if you had a reproducer.