segmentio/analytics-node

Inconsistent `track` callback behaviour

BrandonNoad opened this issue · 7 comments

See my previous comment here: #231

I may have been incorrect in saying that the callback is being called before the message is actually flushed. That was just an assumption based on the fact that the callback is being called twice and in the first call, batch is undefined.

Here is some code that demonstrates the behaviour I described:

'use strict';

const SegmentAnalytics = require('analytics-node');

const SEGMENT_WRITE_KEY = 'YOUR_SEGMENT_WRITE_KEY';

const segmentAnalyticsClient = new SegmentAnalytics(SEGMENT_WRITE_KEY);

const trackAsync = (message) =>
    new Promise((resolve, reject) => {
        segmentAnalyticsClient.track(message, (err, batch) => {
            if (err) {
                return reject(err);
            }

            console.log('event', message.event);
            console.log('batch', batch);

            // There is a "bug" in the library and this callback is being called twice. `batch` is
            // equal to `undefined` on the first call.
            if (batch !== undefined) {
                return resolve(batch);
            }
        });
    });

const createAccountAndContact = async ({ vendor, user }) => {
    await Promise.all([
        trackAsync({
            userId: vendor.id,
            event: 'upsert-account-by-vendor-uuid',
            properties: { Name: vendor.name },
            integrations: { Salesforce: true }
        }),
        trackAsync({
            userId: user.email,
            event: 'upsert-contact-by-email',
            properties: { LastName: user.lastName },
            integrations: { Salesforce: true }
        })
    ]);

    return trackAsync({
        userId: user.id,
        event: 'relate-contact-to-account',
        properties: {
            // Relate Contact to an Account using an external id.
            Account: { VendorOrgId__c: vendor.id }
        },
        integrations: { Salesforce: true }
    });
};

const vendor1 = {
    id: 'd878f010-22a7-11ec-b114-ed63e4eb602b',
    name: 'Test Vendor 1'
};

const user1 = {
    id: 'e8f97c70-22a7-11ec-b114-ed63e4eb602b',
    email: 'test1@example.com',
    lastName: 'Test1'
};

const vendor2 = {
    id: 'e878f010-22a7-11ec-b114-ed63e4eb602b',
    name: 'Test Vendor 2'
};

const user2 = {
    id: 'f8f97c70-22a7-11ec-b114-ed63e4eb602b',
    email: 'test2@example.com',
    lastName: 'Test2'
};

(async () => {
    try {
        await createAccountAndContact({ vendor: vendor1, user: user1 });

        // The promise above resolves because, "By default, our library will flush the very first
        // time it gets a message".
        console.log('HERE');

        await createAccountAndContact({ vendor: vendor2, user: user2 });

        // The promise above never resolves, so we will not reach here.
        console.log('DONE!');
    } catch (err) {
        console.log(err);
    }
})();

I am using the Salesforce integration, but I doubt the actual messages/events matter.

And here is the output:

event upsert-account-by-vendor-uuid
batch undefined
event upsert-account-by-vendor-uuid
batch {
  batch: [
    {
      userId: 'd878f010-22a7-11ec-b114-ed63e4eb602b',
      event: 'upsert-account-by-vendor-uuid',
      properties: [Object],
      integrations: [Object],
      type: 'track',
      context: [Object],
      _metadata: [Object],
      timestamp: 2021-10-01T11:30:15.947Z,
      messageId: 'node-e5cb6754cc143c059be23307de9094ca-343f8843-895c-429e-8be9-ae3a15aa437a'
    }
  ],
  timestamp: 2021-10-01T11:30:15.951Z,
  sentAt: 2021-10-01T11:30:15.951Z
}
event upsert-contact-by-email
batch undefined
event upsert-contact-by-email
batch {
  batch: [
    {
      userId: 'test1@example.com',
      event: 'upsert-contact-by-email',
      properties: [Object],
      integrations: [Object],
      type: 'track',
      context: [Object],
      _metadata: [Object],
      timestamp: 2021-10-01T11:30:15.951Z,
      messageId: 'node-b79373bb7721c2c217f5b2b76f5459b4-e85261e8-524a-456e-9b19-928a6c6c7976'
    }
  ],
  timestamp: 2021-10-01T11:30:25.958Z,
  sentAt: 2021-10-01T11:30:25.958Z
}
event relate-contact-to-account
batch undefined
event relate-contact-to-account
batch {
  batch: [
    {
      userId: 'e8f97c70-22a7-11ec-b114-ed63e4eb602b',
      event: 'relate-contact-to-account',
      properties: [Object],
      integrations: [Object],
      type: 'track',
      context: [Object],
      _metadata: [Object],
      timestamp: 2021-10-01T11:30:26.350Z,
      messageId: 'node-af0997aa2787ff4b77cc0d743f92e441-c8f112bb-fdac-4b02-a3b7-2c71ecdea715'
    }
  ],
  timestamp: 2021-10-01T11:30:36.351Z,
  sentAt: 2021-10-01T11:30:36.351Z
}
HERE
event upsert-account-by-vendor-uuid
batch undefined
event upsert-contact-by-email
batch undefined
event upsert-account-by-vendor-uuid
batch {
  batch: [
    {
      userId: 'e878f010-22a7-11ec-b114-ed63e4eb602b',
      event: 'upsert-account-by-vendor-uuid',
      properties: [Object],
      integrations: [Object],
      type: 'track',
      context: [Object],
      _metadata: [Object],
      timestamp: 2021-10-01T11:30:36.747Z,
      messageId: 'node-53cdabbe5aacab486a3c10303c1ce8b4-80bb5619-e756-4d79-a0b1-30977dfeb92d'
    },
    {
      userId: 'test2@example.com',
      event: 'upsert-contact-by-email',
      properties: [Object],
      integrations: [Object],
      type: 'track',
      context: [Object],
      _metadata: [Object],
      timestamp: 2021-10-01T11:30:36.748Z,
      messageId: 'node-b67c074815aa03ca382de9ff1c8603bf-d7b6a8ec-8e2f-4ba9-bd63-266694fa3d47'
    }
  ],
  timestamp: 2021-10-01T11:30:46.751Z,
  sentAt: 2021-10-01T11:30:46.751Z
}

As you can see, in the second call to createAccountAndContact, the track callback is only called once for the upsert-contact-by-email event, and so the Promise.all never resolves because of the if (batch !== undefined) check in the callback (the first call to createAccountAndContact actually works as expected because the client doesn't batch on its very first message).

I could remove the if (batch !== undefined) check, and the script would finish, but that may cause issues if I was triggering some code that should only run once in the callback. Also, the batch data is always undefined on the first callback call, and my code could potentially need that value.

Another way to get the script to finish is to set flushAt equal to 1, but then there is no batching.

Ideally, the track callback would only be called once, and the batch value would never be undefined.

Thanks.

The reason your callback is called multiple times is due to #308

The reason DONE! is never printed is because the last messages are still sitting in the queue. Messages are not automatically flushed until the flushAt limit is reached (default 20) or flushInterval is reached (default 10 seconds). So if you were to wait the full 10 seconds you should see it complete.

You can force a flush using segmentAnalyticsClient.flush(). Of course, this would require you to defer awaiting all your promises until after you call flush().

Another option is to disable automatic flushing and rely on manual flushing only as described in #305 (comment)

So if you were to wait the full 10 seconds you should see it complete.

It doesn't matter how long you wait because the promise for the upsert-contact-by-email event never resolves because the callback is only called once for it (with batch equal to undefined).

Ah I see. The fact that the callback is called twice is due to a bug (#308). The callback receives the batch argument when called as the flush function callback. This only occurs when a flush is triggered after queuing the message. batch is not provided (thus undefined) when the callback is called from the queued message callback. The queued message callback is always called.

So if you want to denounce using the presence of batch, you should invert your logic:

if (batch === undefined) {
  return resolve();
}

From the docs:

Batching means that your message might not get sent right away. But every method call takes an optional callback, which you can use to know when a particular message is flushed from the queue.

https://segment.com/docs/connections/sources/catalog/libraries/server/node/#batching

The current behaviour doesn't match what is described in the documentation. I expected the callback to be called after the message is flushed, but that does not always happen. In my case, I am not interested in knowing when the message has been put in the queue.

Are there any plans to change that?

FYI I am not affiliated with Segment or analytics-node. I am dealing with these same frustrating issues and I am just reporting what I am finding.

The callback is always called and it is always after the message has been flushed (sent to the server). Due to a bug, sometimes it is called a second time, but again, both times are always after the message has been flushed.

Once from line 266 and sometimes again from 267:

analytics-node/index.js

Lines 265 to 268 in 70f24fe

const done = err => {
callbacks.forEach(callback => callback(err))
callback(err, data)
}

The callback is always called once with an undefined batch argument (line 266). The callback is ALSO sometimes called a second time with a non-undefined batch argument (line 267). So your solution is close, but if you want to ignore the second callback invocation caused by the bug, you should ignore invocations when batch is not undefined.

@yo1dog Sorry, I thought you were part of the Segment team. Thanks for all your help/feedback.

you should ignore invocations when batch is not undefined

Yes, that seems to produce the correct behaviour that I am looking for. And I don't actually need the batch data. I'm not sure why I was under the impression that it was being called with undefined when it was put into the queue (instead of when flushed). Likely because of the missing batch data.

It seems to me that they should just get rid of line 267 and pass data to the callback calls on line 266.

I am going to close this issue since it is essentially a duplicate of #308 and #231.