Repeatable job never gets processed sometimes

Question

Repeatable job never gets processed sometimes

malisetti opened this issue 5 years ago · 23 comments

I have observed that sometimes a repeated job never gets handled for processing. If you get the getRepeatableJobs of the queue, sometimes(randomly happening) the next value is always less than current timestamp and both getDelayedCount and getPausedCount are 0

Minimal, Working Test code to reproduce the issue


const helloQueue = new Queue('hello-queue', 'redis://127.0.0.1:6379');

(async () => {
    helloQueue.process(async (job, done) => {
        console.log('hello world');
        done();
    });

    const c = await helloQueue.add(null, {
        repeat: { cron: '*/2 * * * *' },
    });

    setInterval(() => {
        helloQueue.getDelayedCount().then(j => console.log(j));
        helloQueue.getPausedCount().then(j => console.log(j));
        helloQueue.getRepeatableJobs().then(ji => console.log(ji));
        console.log(new Date().getTime());
    }, 10 * 1000);
})();

Output:

current timestamp: 1589657543148
[
  {
    key: '__default__::::*/2 * * * *',
    name: '__default__',
    id: null,
    endDate: null,
    tz: null,
    cron: '*/2 * * * *',
    every: null,
    next: 1589657520000
  }
]
getDelayedCount: 0
getPausedCount: 0

Create a `hello world` cron job which prints to console every 2 minutes

Bull version: "bull": "^3.13.0"

Additional information

next timestamp is lower than the current timestamp and the hello world is not printed every 2 mins

Answer 1 · 2020-05-16T19:59:49.000Z

Sometimes, even with every repeat options too this happens.

1589659117238
[
  {
    key: '__default__:::120000',
    name: '__default__',
    id: null,
    endDate: null,
    tz: null,
    cron: null,
    every: 120000,
    next: 1589659080000
  }
]
0
0

Answer 2 · 2020-05-17T09:11:21.000Z

Same issue occurs with https://github.com/taskforcesh/bullmq also

Answer 3 · 2020-06-12T14:39:49.000Z

I am facing the same problem. At some point the delay counter is negative and the repeatable job is no longer scheduled.

Answer 4 · 2020-06-13T15:58:27.000Z

One observation: if the current run didnt handle any jobs, restarting the worker process picks them up.

Answer 5 · 2020-07-02T13:11:19.000Z

Maybe this is fixed by 10a9eae since version 3.14.0+?

Answer 6 · 2020-07-03T08:05:49.000Z

Tried with "bull": "^3.15.0", seems not fixed

Answer 7 · 2020-07-06T13:40:45.000Z

We're using Bull for repeated jobs only. About 1-2 weeks ago I saw the behavior you described. Jobs were working fine for some time but suddenly processing stopped. Since we upgraded to 3.15.0 we didn't see this behavior anymore.

I set up a minimal example with your code, Node 12.18.1 and version 3.15.0 of Bull locally. Until now (about 3 hours) everything is working fine. How long do you usually have to wait until "hello world" isn't printed anymore?

Answer 8 · 2020-07-07T03:48:20.000Z

The issue I observed is not about "jobs not running after sometime". I am facing an issue where if I restart the worker process, it will not pick up any jobs to process. So each start of the worker process behaves differently. Hope I am clear. Let me know your thoughts.

Answer 9 · 2020-07-07T06:35:35.000Z

Same issue here, I also experience this issue (related to repeated jobs) where I try to clear the queue of them before adding: #1792.

Answer 10 · 2020-07-17T12:23:23.000Z

Ran into this issue as well, every as well as repeatable options are completely unusable due to this imo. Had to resort to using my own scheduler with a job-in-queue check 😕

Answer 11 · 2020-07-17T21:43:14.000Z

@fspoettel can you be more explicit, what is "completely unusable" ? can you provide a code example that reproduces the issue?

Answer 12 · 2020-07-17T21:44:35.000Z

Guys, If you can provide some use case that reproduces the issue I can look into it, as this issue stands currently it is not reproducible. Also note that repeatable jobs are working fine for many people, so this must be some edge case or something. I will love to look into it, but need some code I can use for reproduce it.

Answer 13 · 2020-07-17T23:40:40.000Z

@fspoettel can you be more explicit, what is "completely unusable" ? can you provide a code example that reproduces the issue?

Yes, I can produce a test case for this, give me some time to isolate it from the project I'm using bull in. I'll try to provide a repo with an isolated example.

I ran into this problem not too long ago. It boils down to the queue not respecting the order of repeatable tasks when a number of tasks are registered in a very short time frame. Some tasks are executed multiple times in a row while others are "stuck" in delayed state. The problem gets worse if rate limiting option is involved.

Answer 14 · 2020-07-18T07:43:31.000Z

Great, I will look into it as soon as you provide the test case.

Answer 15 · 2020-07-20T06:06:49.000Z

@manast hi, thanks for this library. Hope this issue is clear to you. I have provided the sample code to reproduce this issue at the first comment.

Answer 16 · 2020-07-20T08:58:14.000Z

@manast This is a minimal example of what I ran into (edit: fix and add rate limiter which actually causes the issue)

const Queue = require('bull');
(async () => {
  const testQueue = new Queue('test-queue', {
    redis: 'redis://127.0.0.1:6379',
    limiter: {
      max: 1,
      duration: 1000,
    }
  });
  await testQueue.empty();

  testQueue.process((job, done) => {
    console.log(job.data.id);
    done();
  });

  await testQueue.add({ id: 'foo' }, {
    repeat: { every: 1000 },
    jobId: 'foo-id'
  });

  console.log(`queued foo`);

  setTimeout(async () => {
    await testQueue.add({ id: 'bar' }, {
      repeat: { every: 1000 },
      jobId: 'bar-id'
    });

    console.log(`queued bar`);
  }, 100);
})();

If I comment out the foo task, it prints:

➜  bull-test-case node index
queued bar
bar
bar
bar
bar

If I comment out the bar task, it prints:

➜  bull-test-case node index
queued foo
foo
foo
foo
foo

When both are present, it prints:

➜  bull-test-case node index
queued foo
foo
queued bar
foo
foo
foo
bar
bar
foo
foo
foo
foo
foo
foo

while it should print foo and bar in alternating fashion

Answer 17 · 2020-07-20T10:22:50.000Z

@fspoettel ok. So this example works as designed. In order to achieve what you want you need to remove the previous repeatable job, i.e. you cannot update a given repeatable job. If you add "bar" with a different setting than every: 1000 then it would be added and you will have 2 different repeatable jobs.
For deleting a repeatable job you can either use https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueremoverepeatable specifying the same repeatable options, or https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueremoverepeatablebykey but then you need to use https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queuegetrepeatablejobs to get the keys.

Answer 18 · 2020-07-20T10:36:59.000Z

@mseshachalam regarding your issue. If the next timestamp is older than current timestamp, that means that for some reason the delayed job that is waiting for the next repetition has been removed from the delay set. Did you possibly remove the delay set or called the "empty" function (that also removes the delayed jobs and will effectively break the repetitions?)

Answer 19 · 2020-07-20T10:38:26.000Z

@manast Thanks for your helpful comment. Apologies, I messed up the example and forgot to assign separate jobId params 🙈
The test case then works with correct ordering until I introduce a limiter on the queue which causes order to be lost. I updated the test case but I see that there are already tracking bugs for the limiter elsewhere. Sorry for commenting on the wrong issue, seems like my problem actually lies with the limiter (also sorry OP!)

Answer 20 · 2020-07-20T10:44:42.000Z

no problem!.

Answer 21 · 2020-07-20T10:48:35.000Z

Yes @manast , i have .empty called on the job queues. My code has the following structure.

customSchedulerQueue.process(async (job, done) => {})
await customSchedulerQueue.empty()
await customSchedulerQueue.add(null, { repeat: { cron: '*/2 * * * *' } })
customSchedulerQueue.on('completed', (job, result) => {
// Job completed with output result!
logger.log('custom event scheduling completed', result);
});

Answer 22 · 2020-07-20T11:15:07.000Z

@manast

Instead of emptying the queue, I am using

await customSchedulerQueue.removeRepeatable({
cron: eventSchedulingCronExp,
jobId: 'csq',
});

and my process is working as expected. Thanks.

Answer 23 · 2022-09-07T05:27:19.000Z

I also have a similar problem. Repeatable jobs do not start after some time. I have around 10k jobs and they stay always in a "delayed' state. As definition of jobs is kept in a separate database, and they are created and deleted in bulk, I use .obliterate() quite often to quickly remove and recreate jobs, if a bulk operation is involved. But after some time, the queue stops working. No jobs are picked any more. The solution I found is to create a new queue with a different name, recreate jobs and I am good again. I did not investigate that a lot, but observing a Bull queue using Bull dashboard I noticed, that after adding a jobs to a broken queue when the time to launch comes, jobs are moved for a to a "waiting" queue" for a moment (but not picked by a worker) and again land in the 'delayed' one with no further activity later on whatsoever. I can promote a job manually and it is processed.