SIGKILL on job timeout - no log line in laravel.log
Closed this issue · 4 comments
- Laravel Version: 5.4.36
- PHP Version: PHP 7.1.12-3+ubuntu14.04.1+deb.sury.org+1
- Database Driver & Version: jenssegers/laravel-mongodb v3.2
Description:
Hi, this is my first time posting, so apologies if I missed anything.
Firstly, thank you for this amazing project, it's the backbone of my system.
Onto my problem. I rely on queues (rabbitmq + fhteam/laravel-amqp) and they have proved super useful. When a job times out (by default 60 seconds) it is killed with a SIGKILL, nothing is appended to the laravel log, and the job is not moved to the failed_jobs table.
I had a problem with this in production - all I saw was my job repeatedly failing, and not being moved to the failed jobs table. So I ran the queue worker manually on the CLI, and all I saw was "Killed". The supervisord logs suggest it was terminated by SIGKILL. Typically this suggests that the OS killed the process due to an OOM problem, however there was nothing in the OS logs.
So it appears as a silent crash. It's only after I deduced the timing that I realised my timeout value was not being respected (this was actually a bug in my code, see code sample below if curious)
class MyJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
// WARNING - THIS TIMEOUT WILL NOT BE RESPECTED - BE CAREFUL IF USING const
private const JOB_TIMEOUT = 86400; // 1 DAY
public $timeout = self::JOB_TIMEOUT;
public function __construct(){}
public function handle(){}
}
I just wanted to check if the lack of an entry in the log file is a bug or a limitation of how the process is terminated, and if it is a limitation I would appreciate any suggestions on how I can better manage it.
Steps To Reproduce:
- Create a job class with no timeout (default is 60s)
- Enqueue a job
- Make sure that the
handle()
function will take >60 seconds - Start a queue worker
- The job will fail N times, no log lines will be appended to laravel.log, and the job will not enter the failed_jobs table
Thanks again,
I believe you can/should use the queue event system to get hold of these errors. But I'm not 100% sure this also applies to the specific case you described:
Here's a trimmed version of what I did to not miss anything, from my AppServiceProvider.php::boot()
:
protected function initializeQueueEvents()
{
Queue::before(function (JobProcessing $event) {
$job = $event->job;
$logger = $this->getLoggerForQueue($job->getQueue());
$logger->info(\get_class($event), $this->getJobDetails($job));
});
Queue::after(function (JobProcessed $event) {
$job = $event->job;
$logger = $this->getLoggerForQueue($job->getQueue());
$logger->info(\get_class($event), $this->getJobDetails($job));
});
Queue::exceptionOccurred(function (JobExceptionOccurred $event) {
$job = $event->job;
$logger = $this->getLoggerForQueue($job->getQueue());
$details = $this->getJobDetails($job);
$details['exception'] = $event->exception;
$logger->error(\get_class($event), $details);
});
Queue::failing(function (JobFailed $event) {
$job = $event->job;
$logger = $this->getLoggerForQueue($job->getQueue());
$details = $this->getJobDetails($job);
$details['exception'] = $event->exception;
$logger->error(\get_class($event), $details);
});
}
Note: I'm using a dedicated logger because our system has so many jobs to process we need a dedicated log "per queue".
TL;DR: you might want to remove getLoggerForQueue
and getJobDetails
and simply log something using \Log::…
Are you saying that the job isn't thrown into the failed_jobs table even after it reaches maximum number of attempts?
@sisve yes I believe so. I start the queue worker using supervisord, with the following command:
command=php /opt/my_laravel_project/artisan queue:work --queue=QUEUE_1,QUEUE_2 --tries=3 --no-interaction
and the job never enters the failed_jobs table. I'm using mongodb, that may or may not be an extra complication.
@mfn the Queue::before
event fires, but the other three events do not fire (Queue::after
, Queue::exceptionOccurred
, Queue::failing
) :/
Can't replicate this, just like your other issue report I think it's only specific to your app, you'll need to investigate this more since you're the only one who can replicate.