ostark/craft-async-queue

Parameter for number of background processes is ignored

Closed this issue · 27 comments

We're facing the problem that the parameter for the number of concurrent background processes – set in the environment file via ASYNC_QUEUE_CONCURRENCY=4 – is ignored at times. First, everything seems fine and the maximum number of reserved processes equals the number set in the env file. However, after a while (hard to say when exactly, but it happens) the number of concurrent processes starts to increase, bringing the CPU close to 100% and endangering the uptime of the website. This is particularly worrying when there are a lot of processes to be dealt with. Restarting PHP helps solve the issue, but isn't a feasible solution.

In the screenshots below, you can see that there's a limit of 4 set in the plugin, but there are 7 concurrent processes. We're running Craft 3.4.10.1 (latest as of today) with PHP 7.2 on a Web Server with Ubuntu 18.04.

Screenshot 2020-03-24 at 10 55 53

Screenshot 2020-03-24 at 10 54 23

Servus @martinhellwagner!
It's probably related to #27 - the details you provided are useful. This could be a lead to solve it eventually.

Grüß dich @ostark! 🙂

Glad I could help! Let me know if you need further information. I'm monitoring the queue as we speak in order to find a pattern. 4 concurrent jobs keep the CPU at 60-70% according to our monitoring, while 6+ concurrent jobs drive it up to 90-100%.

By the way, this is how the queue looks like when everything's still good (for example after a PHP restart):

Screenshot 2020-03-24 at 11 30 11

Can you watch out for the message Has not waited the lock. in your queue/console/web logs?

Strangely, we don't have a queue.log file in our currently deployed release on the Production system – only on the Stage system.

However, I've found two instances of this error in the console.log file on the Production system:

2020-03-23 19:39:19 [-][-][-][error][yii\base\Exception] yii\base\Exception: Has not waited the lock. in /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/craftcms/cms/src/queue/Queue.php:562
Stack trace:
#0 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/craftcms/cms/src/queue/Queue.php(120): craft\queue\Queue->reserve()
#1 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/craftcms/cms/src/queue/Command.php(77): craft\queue\Queue->run()
#2 [internal function]: craft\queue\Command->actionRun()
#3 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array(Array, Array)
#4 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/base/Controller.php(157): yii\base\InlineAction->runWithParams(Array)
#5 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/console/Controller.php(164): yii\base\Controller->runAction('run', Array)
#6 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/base/Module.php(528): yii\console\Controller->runAction('run', Array)
#7 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/console/Application.php(180): yii\base\Module->runAction('queue/run', Array)
#8 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/craftcms/cms/src/console/Application.php(87): yii\console\Application->runAction('queue/run', Array)
#9 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/console/Application.php(147): craft\console\Application->runAction('queue/run', Array)
#10 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/vendor/yiisoft/yii2/base/Application.php(386): yii\console\Application->handleRequest(Object(craft\console\Request))
#11 /home/forge/www.bistum-passau.de/releases/165c676b83330391a42e7d3556134840dca5ce29/craft(23): yii\base\Application->run()
#12 {main}
2020-03-23 19:39:16 [-][-][-][info][application] $_GET = []

What I've also noticed: when I'm restarting PHP because of too many jobs being executed simultaneously, I'm sometimes running into the same issue described in #27 – that pending jobs aren't executed. I can force them to run via ./craft queue/run in the Craft CLI though.

If you don't restart PHP, the queue keeps adding more and more concurrent processes (currently 9):

Screenshot 2020-03-24 at 13 34 38

Which of course results in enormous CPU consumption:

Screenshot 2020-03-24 at 13 37 26

It would be great to get the related logs:

  1. Enable debugMode
  2. tail -f storage/logs/web.log | grep ProcessPool

There's an awful lot of output on the console for this command, since the queue has just finished running 40k jobs. Does it make sense to attach the log files as a ZIP here for your to look into?

just 20 lines or so

This are the latest Iogs I'm getting:

forge@bistum01:~/www.bistum-passau.de/current$ tail -f storage/logs/web.log | grep ProcessPool
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:00 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:01 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
tail: storage/logs/web.log: file truncated
2020-03-24 13:53:02 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][-][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][-][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][-][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:02 [-][12875][-][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Deleting stale template caches)
2020-03-24 13:53:04 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:04 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:04 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:04 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)
2020-03-24 13:53:04 [-][12875][gfvr2kck6te0f0di3rm38n7ip3][trace][async-queue] ProcessPool::canIUse() (4 of 4, context: Updating search indexes)

devMode is enabled

Currently, we are back at 4 concurrent jobs after a PHP restart. I'll check and make sure to output some logs again when there are > 4 concurrent jobs.

Can you try

composer require ostark/craft-async-queue:dev-reserved#a8b8d74f1af44441ad9dc6e48639770849397096

I've installed the new version with the command you've posted above.

Even though the async queue test works fine with the maximum number of concurrent jobs (still set to 4), the "normal" queue doesn't. I've triggered a queue of 5.000+ jobs and watched it being processed. Still 5+ jobs at a time:

Screenshot 2020-03-25 at 14 49 41

Am I missing something here? The "pool concurrency" is set correctly, while the value for "pool actual usage" is suddenly zero:

Screenshot 2020-03-25 at 14 52 11

I've triggered a queue of 5.000+ jobs

Can you explain how you do it?

--
FYI: the Info table on the Test Utility is a bit misleading so far. I'll push an update later today.

From the CP, I've rebuilt the asset indexes of one of our volumes, which contains 115 images. Since we have a lot of multisites on this particular project, this action triggered around 5.700 queue jobs – mostly Update search indexes, as seen in the screenshot above.

Honestly, I can't reproduce how you create that many processes (6 instead of 4) with the changes I made.
I release 2.1 f53f835

Closing this issue. Hopefully, the new release prevents real damage on your server.

Okay, I'll install the new release and check again. Thanks for your help! 🙂

For what it's worth, I could now reproduce the "Has not waited the lock" error in the Craft CLI (with version 2.0.0 on our Production system):

Screenshot 2020-03-26 at 23 13 53

Will upgrade the Production system to version 2.1.0 though.

Let me know if it makes a difference

Unfortunately not. The same error still pops up in the CLI. In the Queue Manager – even though I increased the limit of concurrent jobs to 16 for testing purposes – I'm still getting weird numbers and behaviour.

Screenshot 2020-03-27 at 10 56 37

Screenshot 2020-03-27 at 10 56 55

When you set the max concurrency to 16, it is the upper limit of queue/run processes. 'Reserved' in the manager table and 'Reserverad Jobs' in the test utility should be the same, as the source is the same.

Actually, we just discovered that since updating to 2.1.0, we're receiving multiple deadlock messages in our logs and trying to upload images:

Screenshot 2020-03-27 at 11 38 31

Have to revert back 2.0.0, it seems.

Does the upper level mean that there can be less (but can't be more) concurrent jobs? Is is possible to tell the queue to use the exact number of concurrent processes – never less, never more?

We're trying to find a solution where we have a fast queue while keeping server load at a good percentage.

Sorry, that's beyond the scope of the plugin. ASYNC_QUEUE_CONCURRENCY should prevent damage, aka heavy load on the server.

I would not use the Craft's DB queue for high concurrency. Use SQS or Redis and run multiple queue/listeners.