taskiq-python/taskiq

Python GIL, worker hangs forever

toan-pb opened this issue · 6 comments

When performing nested tasks multiple times (for example, 100 times within 1 second), the worker hangs forever (I think it's due to the Python Global Interpreter Lock, GIL).
Currently, I am using a timeout in task.wait_result(timeout=5) to address this issue.
I believe there might be a better way to solve this problem. Below is my code.

@broker.task
async def health_check_worker() -> None:
    """Health check worker."""
    logger.info("Worker still alive!")
    task = await task_2.kiq()
    await task.wait_result()


@broker.task
async def task_2() -> None:
    await asyncio.sleep(1)
    task = await task_3.kiq()
    await task.wait_result()


@broker.task
async def task_3() -> None:
    await asyncio.sleep(1)

If I'm understanding the problem you're describing correctly, it may be because some of the outer tasks are awaiting results of tasks that have not yet been run, and so the queue is in deadlock.

This would make the most sense to me, seeing as the default amount of concurrent async tasks that can be run with a single worker is 100. If you run the worker with the argument --max-async-tasks (See here) set to something much larger (e.g. --max-async-tasks 1000) and run 100 tasks, do you see the same issue?

If I'm understanding the problem you're describing correctly, it may be because some of the outer tasks are awaiting results of tasks that have not yet been run, and so the queue is in deadlock.

This would make the most sense to me, seeing as the default amount of concurrent async tasks that can be run with a single worker is 100. If you run the worker with the argument --max-async-tasks (See here) set to something much larger (e.g. --max-async-tasks 1000) and run 100 tasks, do you see the same issue?

Thank you.
I've tried the arguments --max-async-tasks 10000 and --max-threadpool-threads 10000 but I still encounter the same issue.
The only increase in workers using -w 100 (which consumes quite a lot of resources) seems to work, as with 100 tasks/s I don't face the issue (though I think if it increases to 1000 tasks/s, I might encounter the problem again).
Perhaps increasing the number of workers is currently the only way to prevent the queue from deadlocking.

What broker are you using? If it's the AioPikaBroker there might be something going on with message acknowledgements that would result in your situation (i.e. increasing max async tasks and threads still causes deadlock, but increasing the number of workers avoids it).

What broker are you using? If it's the AioPikaBroker there might be something going on with message acknowledgements that would result in your situation (i.e. increasing max async tasks and threads still causes deadlock, but increasing the number of workers avoids it).

Thank you,
After switching to Redis ListQueueBroker, I no longer encounter the deadlock issue.
Perhaps I need a config parameter for message acknowledgements if I want to use AioPikaBroker.

Glad I could help. If you provide the qos argument to AioPikaBroker (AioPikaBroker(qos=100)), this should allow the worker to consume more messages without providing acknowledgement to the RabbitMQ backend. The default is 10, but it can be set to any number. Additionally, I believe you can set it to 0 to allow for unlimited consumption of messages without acknowledgement. In combination with an arbitrarily increased --max-async-tasks this should also help reduce the likelihood of a deadlock (although IIRC taskiq documentation cautions against an unlimited number of async tasks running concurrently, as this could lead to undefined behaviour).

Glad I could help. If you provide the qos argument to AioPikaBroker (AioPikaBroker(qos=100)), this should allow the worker to consume more messages without providing acknowledgement to the RabbitMQ backend. The default is 10, but it can be set to any number. Additionally, I believe you can set it to 0 to allow for unlimited consumption of messages without acknowledgement. In combination with an arbitrarily increased --max-async-tasks this should also help reduce the likelihood of a deadlock (although IIRC taskiq documentation cautions against an unlimited number of async tasks running concurrently, as this could lead to undefined behaviour).

Setting the qos parameter in AioPikaBroker(qos=100) (or even to 0), combined with --max-async-tasks, has solved my problem. Thank you very much.