Repeatable job never gets processed sometimes
malisetti opened this issue ยท 23 comments
I have observed that sometimes a repeated job never gets handled for processing. If you get the getRepeatableJobs of the queue, sometimes(randomly happening) the next value is always less than current timestamp and both getDelayedCount and getPausedCount are 0
Minimal, Working Test code to reproduce the issue
const helloQueue = new Queue('hello-queue', 'redis://127.0.0.1:6379');
(async () => {
helloQueue.process(async (job, done) => {
console.log('hello world');
done();
});
const c = await helloQueue.add(null, {
repeat: { cron: '*/2 * * * *' },
});
setInterval(() => {
helloQueue.getDelayedCount().then(j => console.log(j));
helloQueue.getPausedCount().then(j => console.log(j));
helloQueue.getRepeatableJobs().then(ji => console.log(ji));
console.log(new Date().getTime());
}, 10 * 1000);
})();
Output:
current timestamp: 1589657543148
[
{
key: '__default__::::*/2 * * * *',
name: '__default__',
id: null,
endDate: null,
tz: null,
cron: '*/2 * * * *',
every: null,
next: 1589657520000
}
]
getDelayedCount: 0
getPausedCount: 0
Create a hello world
cron job which prints to console every 2 minutes
Bull version: "bull": "^3.13.0"
Additional information
next timestamp is lower than the current timestamp and the hello world is not printed every 2 mins
Sometimes, even with every
repeat options too this happens.
1589659117238
[
{
key: '__default__:::120000',
name: '__default__',
id: null,
endDate: null,
tz: null,
cron: null,
every: 120000,
next: 1589659080000
}
]
0
0
Same issue occurs with https://github.com/taskforcesh/bullmq also
I am facing the same problem. At some point the delay counter is negative and the repeatable job is no longer scheduled.
One observation: if the current run didnt handle any jobs, restarting the worker process picks them up.
Tried with "bull": "^3.15.0", seems not fixed
We're using Bull for repeated jobs only. About 1-2 weeks ago I saw the behavior you described. Jobs were working fine for some time but suddenly processing stopped. Since we upgraded to 3.15.0 we didn't see this behavior anymore.
I set up a minimal example with your code, Node 12.18.1 and version 3.15.0 of Bull locally. Until now (about 3 hours) everything is working fine. How long do you usually have to wait until "hello world" isn't printed anymore?
The issue I observed is not about "jobs not running after sometime". I am facing an issue where if I restart the worker process, it will not pick up any jobs to process. So each start of the worker process behaves differently. Hope I am clear. Let me know your thoughts.
Same issue here, I also experience this issue (related to repeated jobs) where I try to clear the queue of them before adding: #1792.
Ran into this issue as well, every
as well as repeatable
options are completely unusable due to this imo. Had to resort to using my own scheduler with a job-in-queue
check ๐
@fspoettel can you be more explicit, what is "completely unusable" ? can you provide a code example that reproduces the issue?
Guys, If you can provide some use case that reproduces the issue I can look into it, as this issue stands currently it is not reproducible. Also note that repeatable jobs are working fine for many people, so this must be some edge case or something. I will love to look into it, but need some code I can use for reproduce it.
@fspoettel can you be more explicit, what is "completely unusable" ? can you provide a code example that reproduces the issue?
Yes, I can produce a test case for this, give me some time to isolate it from the project I'm using bull in. I'll try to provide a repo with an isolated example.
I ran into this problem not too long ago. It boils down to the queue not respecting the order of repeatable tasks when a number of tasks are registered in a very short time frame. Some tasks are executed multiple times in a row while others are "stuck" in delayed state. The problem gets worse if rate limiting option is involved.
Great, I will look into it as soon as you provide the test case.
@manast hi, thanks for this library. Hope this issue is clear to you. I have provided the sample code to reproduce this issue at the first comment.
@manast This is a minimal example of what I ran into (edit: fix and add rate limiter which actually causes the issue)
const Queue = require('bull');
(async () => {
const testQueue = new Queue('test-queue', {
redis: 'redis://127.0.0.1:6379',
limiter: {
max: 1,
duration: 1000,
}
});
await testQueue.empty();
testQueue.process((job, done) => {
console.log(job.data.id);
done();
});
await testQueue.add({ id: 'foo' }, {
repeat: { every: 1000 },
jobId: 'foo-id'
});
console.log(`queued foo`);
setTimeout(async () => {
await testQueue.add({ id: 'bar' }, {
repeat: { every: 1000 },
jobId: 'bar-id'
});
console.log(`queued bar`);
}, 100);
})();
If I comment out the foo
task, it prints:
โ bull-test-case node index
queued bar
bar
bar
bar
bar
If I comment out the bar
task, it prints:
โ bull-test-case node index
queued foo
foo
foo
foo
foo
When both are present, it prints:
โ bull-test-case node index
queued foo
foo
queued bar
foo
foo
foo
bar
bar
foo
foo
foo
foo
foo
foo
while it should print foo
and bar
in alternating fashion
@fspoettel ok. So this example works as designed. In order to achieve what you want you need to remove the previous repeatable job, i.e. you cannot update a given repeatable job. If you add "bar" with a different setting than every: 1000 then it would be added and you will have 2 different repeatable jobs.
For deleting a repeatable job you can either use https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueremoverepeatable specifying the same repeatable options, or https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueremoverepeatablebykey but then you need to use https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queuegetrepeatablejobs to get the keys.
@mseshachalam regarding your issue. If the next timestamp is older than current timestamp, that means that for some reason the delayed job that is waiting for the next repetition has been removed from the delay set. Did you possibly remove the delay set or called the "empty" function (that also removes the delayed jobs and will effectively break the repetitions?)
@manast Thanks for your helpful comment. Apologies, I messed up the example and forgot to assign separate jobId
params ๐
The test case then works with correct ordering until I introduce a limiter
on the queue which causes order to be lost. I updated the test case but I see that there are already tracking bugs for the limiter elsewhere. Sorry for commenting on the wrong issue, seems like my problem actually lies with the limiter (also sorry OP!)
no problem!.
Yes @manast , i have .empty called on the job queues. My code has the following structure.
customSchedulerQueue.process(async (job, done) => {})
await customSchedulerQueue.empty()
await customSchedulerQueue.add(null, { repeat: { cron: '*/2 * * * *' } })
customSchedulerQueue.on('completed', (job, result) => {
// Job completed with output result!
logger.log('custom event scheduling completed', result);
});
Instead of emptying the queue, I am using
await customSchedulerQueue.removeRepeatable({
cron: eventSchedulingCronExp,
jobId: 'csq',
});
and my process is working as expected. Thanks.
I also have a similar problem. Repeatable jobs do not start after some time. I have around 10k jobs and they stay always in a "delayed' state. As definition of jobs is kept in a separate database, and they are created and deleted in bulk, I use .obliterate() quite often to quickly remove and recreate jobs, if a bulk operation is involved. But after some time, the queue stops working. No jobs are picked any more. The solution I found is to create a new queue with a different name, recreate jobs and I am good again. I did not investigate that a lot, but observing a Bull queue using Bull dashboard I noticed, that after adding a jobs to a broken queue when the time to launch comes, jobs are moved for a to a "waiting" queue" for a moment (but not picked by a worker) and again land in the 'delayed' one with no further activity later on whatsoever. I can promote a job manually and it is processed.