python-bonobo/bonobo

Async calls in nodes

farzeni opened this issue · 3 comments

Hello Romain,

Thanks for this awesome piece of code.

I'm using bonobo to make data migrations from legacy systems to our product and I'm facing a performance problem. One of the node in my graph needs to do a lot of api calls, one for every processed record. What is happening is that though all other nodes can follow processing data, that specifing node is not gaining any performance improvement from threading. As I understood each node has its own specific thread, so all calls inside a node are blocking and are executed one after another.

I was wondering if there is a ways to use async calls in graph execuption, letting us to rely on something like aiohttp instead of requests.

Digging in the code I saw that you were implementing aio_threadpool strategy and maybe it is in some way related with my problem.

Am I right? In case what is the current status? maybe I can help you with that

Hi @farzeni.

The way to async is not nearly ready but after trying to use the current architecture, I finally thought that the current strategy interface is not suited to go async. This does not mean that we won't go the async road, but not how it was thought first because the architecture needs to be thought asynchronous to exploit the thing correctly.

In short, the aio_threadpool thing did not solve the problem, because it stills run synchronous things in each threads although they are managed by asyncio. So dead end but I have a better option.

Before that and as of today, your best (yet shitty) option to do this is to have an io loop in a node, and yield synchronously things that you got asynchronously. This is a hell to write (because you need to accept/release messages as if they were already processed, and have a buffer to release output messages in the future. Something you pretty much want to avoid to write).

The next step towards aio-nobo is to rewrite the executor part of bonobo to be asynchronous first (and incidentaly, it can also process syncronous code). This is quite a lot of work and it is not ready yet, but all the investigations have already been implemented and tested and (yay) it works (kinda). The related code is in the following branch/folder: https://github.com/hartym/bonobo/tree/executor_reloaded/bonobo/execution/reloaded and it requires a lot of work to integrate it back into bonobo and release something. The idea behind it is that everything gets converted to an asynchronous generator: for synchronous stuff, it runs in a thread. For asynchronous stuff, job's already done. There are still a lot of things to do before it's even testable in real life conditions I'm afraid, and unfortunatelly, I did not find a lot of time lately to work on it. I don't know if the code state is something clean enough that you can help on it, if you wanna give it a shot, we can have a talk about it (I need first to refresh my knowledge of the remaining things to do).

Let me know what you think.

Hey, thanks for the fast reply

Before that and as of today, your best (yet shitty) option to do this is to have an io loop in a node, and yield synchronously things that you got asynchronously. This is a hell to write (because you need to accept/release messages as if they were already processed, and have a buffer to release output messages in the future. Something you pretty much want to avoid to write).

I thought about something like this but it is not viable because it won't be scalable, we will have several nodes with the same issue.

The next step towards aio-nobo is to rewrite the executor part of bonobo to be asynchronous first (and incidentaly, it can also process syncronous code)
[...]
There are still a lot of things to do before it's even testable in real life condition

I understand, I will take a deeper look to the whole code and to the files you pointed me out to understand how much "a lot" it is

Thanks again

Closing this issue which was more a discussion than an issue, per se. Feel free to reopen if you can afford some energy around there.