driver doesn't work correctly when there are failing jobs (as result of timeout or php memory limits)
oprudkyi opened this issue · 2 comments
Hi,
expected behavior - jobs that fail (by 'memory exhausted' - i.e. memory_limit or timeout) after count of tries marked as failed and go to failed_jobs table
current behavior: - jobs that fail are kept in the top of queue and are blocking any other jobs entirely (in case of single worker)
some investigations
- according to Laravel ( https://github.com/laravel/framework/blob/5.4/src/Illuminate/Queue/Worker.php#L132 ) timeout enforced via pcntl_signal and posix_kill https://github.com/laravel/framework/blob/5.4/src/Illuminate/Queue/Worker.php#L569 though not providing ability to break job gradually
- memory exhausted again is fatal error and while theoretically can be caught via register_shutdown_function there no such functionality already
- stock queue drivers for laravel have special approach for such cases
- https://github.com/laravel/framework/blob/5.4/src/Illuminate/Queue/RedisQueue.php maintains special ':reserved' queue to handle already working jobs - https://github.com/laravel/framework/blob/5.4/src/Illuminate/Queue/LuaScripts.php#L32
- db drivers also tracks running jobs specially
- sqs just returns job to queue (in case of errors) (unconfirmed though)
with amqp there are different approach,
pop() just receives message from queue https://github.com/fhteam/laravel-amqp/blob/master/src/Queue/AMQPQueue.php#L350 without acknowledging it (i.e. without calling $this->channel->basic_ack )
so even if worker/job failed as result of fatal error (memory exhausted) or kill (timeout) is still atop of queue without chances to be removed (this is by design of amqp/rabbitmq)
simple fix can be designed as calling basic_ack in function pop() and in case of explicit exception just save it back to queue - such approach will lost any jobs with fatal errors, though queue by self won't be blocked
more complex approach will involve additional queues just to keep track of running jobs with fatal errors
@oprudkyi try my commit with "retry_after" option it will restart failed jobs with timeout, so queue won't be locked
I think we fixed this with our recent PR in master I've just merged. Could you please test?