Proposal: Rename ensure_future to create_task
asvetlov opened this issue · 32 comments
During my training sessions I constantly forced to say:
`ensure_future()` is used for creating new task by 99% of use cases.
The name is not obvious but please remember this quirk.
Maybe function renaming (with keeping backward compatible alias) could smooth asyncio learning curve a bit?
To be honest, the most untuitive one was asyncio.async, but got deprecated for some reason. I like asyncio.async()
much more than either asyncio.ensure_future()
or asyncio.create_task()
.
I know that it is now a proper keyword, but maybe we could do something sensible with the keyword?
async def foo():
something
def main():
task = async foo()
To be honest, the most untuitive one was asyncio.async, but got deprecated for some reason. I like asyncio.async() much more than either asyncio.ensure_future() or asyncio.create_task().
The reason is “async” becoming a keyword.
I know that it is now a proper keyword, but maybe we could do something sensible with the keyword?
async def foo(): something def main(): task = async foo()
I don’t think this is a good idea. Making async
an actual operator will require some sort of protocol to make it possible for frameworks other than asyncio to use it. Adding functions is so much easier.
And async()
wasn't an obvious name in the first place. It doesn’t say anything about what it does. Why should I call the async()
function on a coroutine, if coroutines are already "async"?
If it's a lot of work, I can understand. But IMHO async
is quite clear in intent. It means that something executes asynchronous. As oposed to await
, which runs a coroutine or task synchronously. (the fact that async
creates a task would be considered a side-effect, however, it just does whatever it takes to ensure the coroutine executes in the background).
If I'm allowed just a bit more of bike-shedding, how about a function asyncio.start()
? Seems intuitive and shorter to type than asyncio.create_task()
.
ensure_future
can accept Future
s and Task
s, in addition to coroutines. If it's the latter, it will wrap it in a Task
. If it's the former, it will just return the passed object doing nothing. The current name clearly reflects what the function is actually doing.
If we want to add a function that only accepts coroutines and turns them into tasks, I'd propose to call it spawn
.
start
isn't bad, but we're considering to add asyncio.run
in the near future, so I think we need more task-specific name.
Yes, spawn
is better than start()
, I agree.
If we add spawn()
, which is supposed to be used with coroutine as argument, turning it into a Task, then I think we can keep ensure_future()
alone, as it will mostly be used only internally by asyncio itself, and you can skip it from your training sessions.
+1 for spawn
. Perfect name.
Thanks for having this discussion. :)
I have also felt confusing with the naming of async
and ensure_future
.
My question is: what's the key rationale to have both asyncio.ensure_future
and loop.create_task
?
According to documentation, I see the only difference is that ensure_future
returns a Future if the input is a Future, where cancellation works differently from a Task.
I still don't understand clearly "why" we have both Task and Future (especially for a single API ensure_future
), though the documentation well describes "how" they work differently. My guess is that Future is for a single awaited blocking call whereas Task is for chained multiple blocking calls including yielding the control to the event loop. Am I correct? If so, I reach to the same question as Andrew pointed out: ensure_future
is used to create tasks instead of scheduling futures in most use cases, then why should it be named ensure_future
?
Another concern is that loop.create_task
requires an explicit reference to the event loop while asyncio.ensure_future
doesn't. Since now asyncio.get_event_loop()
always returns the current scheduling loop inside coroutines, loop.create_task
would make people to write boilerplate codes to pass loop
arguments.
So, how about asyncio.schedule()
?
I think this would embrace both Task and Future if we want to keep the backward compatibility but also clarify the naming.
The current name asyncio.ensure_future
feels like that it "ensures" the full execution of the given Task or Future even when they have exceptions, but this is not true.
The current name
asyncio.ensure_future
feels like that it "ensures" the full execution of the given Task or Future even when they have exceptions, but this is not true.
Oh, this is an interesting (to me) misunderstanding. The thing that is "ensured" is the type of the return value, not its eventual execution.
The point of ensure_future()
is if you have something that could either be a coroutine or a Future
(the latter includes a Task
because that's a subclass of Future
), and you want to be able to call a method on it that is only defined on Future
(probably about the only useful example being cancel()
). When it is already a Future
(or Task
) this does nothing; when it is a coroutine it wraps it in a Task
.
If you know that you have a coroutine and you want it to be scheduled, the correct API to use is create_task()
. The only time when you should be calling ensure_future()
is when you are providing an API (like most of asyncio's own APIs) that accepts either a coroutine or a Future
and you need to do something to it that requires you to have a Future
.
Hopefully this helps. I am beginning to feel that the main thing that's wrong here is that the treatment in the docs is not very good...
Thanks for clarification. So ensure_future()
ensures that the returned object is compatible with Future (either a Task or Future).
Still, I have doubt if it's a good name. The current naming is focusing on the input/output types rather than what it does with the input. And what it does seems to overlap with loop.create_task()
. Again, create_task()
naming has its own problem; it lacks the sense of side-effect to the event loop (scheduling & execution of the created task). I think these are why people in this thread are suggesting spawn()
-like namings.
Now I have realized where my misunderstanding came from: asyncio.async()
is action-oriented naming while asyncio.ensure_future()
is type-oriented naming, but I thought it was still action-oriented when first introduced in the standard library.
The docs need improvements as well, because several key example codes related to tasks are using ensure_future()
to create tasks from coroutines. Notable cases are:
- https://docs.python.org/3.6/library/asyncio-task.html#example-parallel-execution-of-tasks
- https://docs.python.org/3.6/library/asyncio-dev.html#chain-coroutines-correctly
I'd like to know if there are technical reasons to use asyncio.ensure_future()
instead of loop.create_task()
in above docs (except historical reasons such as that create_task()
is introduced later in 3.4.2 and the docs are not just updated yet).
Guido, I understand you don't like spawn()
or start()
, but can we still try to come up with a short and intuitive name? I hate writing asyncio.ensure_future()
or loop.create_task()
(especially since my code usually doesn't have an explicit loop variable around).
It's a pity we cannot use async()
anymore. I like @achimnol's schedule
suggestion, anything is better than ensure_future()
, really... launch()
, go()
? ... :-)
Here's a counter-question. Why are you using these functions? My guess is that you've got a coroutine that you want to run "in the background", i.e. without waiting for it. What does that coroutine do? Is it a long-running server task? But why not have a "main" task that waits for all your server tasks (using gather()
if there could be more than one)? As an added benefit it should be possible to arrange for a cancellation of the main task to cancel all the server tasks automatically, and you could do this in response to a signal (details left as an exercise for the reader :-).
Regarding the docs:
https://docs.python.org/3.6/library/asyncio-task.html#example-parallel-execution-of-tasks -- This should not need the ensure_future()
calls at all -- gather()
will call it for you when one of its arguments is a coroutine. Also it makes more sense to rewrite the gather()
call without a list, like this:
loop.run_complete(gather(
factorial("A", 2),
factorial("B", 3),
factorial("C", 4)))
(But before you propose a patch please test this suggestion!)
https://docs.python.org/3.6/library/asyncio-dev.html#chain-coroutines-correctly --
This seems to be explaining that you should serialize your calls rather than trying to run them as background calls (I agree) and the final example shows that you don't need ensure_future()
at all (and again I agree). Maybe we shouldn't have examples that show bad practices?
In the end I still believe that ensure_future()
is an appropriately obscure name for a rarely-needed piece of functionality. When creating a task from a coroutine you should use the appropriately-named loop.create_task()
. Maybe there should be an alias for that asyncio.create_task()
? (You could use asyncio.Task()
but the docs explicitly say not to do that -- IIRC that's because uvloop has its own Task
class that must be created using loop.create_task()
. @1st1?)
Maybe there should be an alias for that asyncio.create_task()?
Yes, that's the idea. It would be nice to have a way to create tasks without explicit loop handling. So asyncio.create_task(coroutine)
would work. I still like spawn
more (even though os.spawn
does something different).
(You could use asyncio.Task() but the docs explicitly say not to do that -- IIRC that's because uvloop has its own Task class that must be created using loop.create_task(). @1st1?)
The initial motivation for adding loop.create_task()
was to make it possible for frameworks to inject their own Task implementation, to implement threadlocal-like objects in asyncio. And AFAIK some people use this feature (including me). Later I used create_task
to implement a faster Task in uvloop.
OK, I'm not totally against spawn() then. I think we should have a PEP
describing all the (major) improvements we want to make to asyncio in
Python 3.7 rather than doing it piecemeal -- it's easy to lose track of the
coherence of functionality and terminology.
I think I'll make a first draft of the PEP soon. I'll document proposed asyncio.run
, asyncio.run_forever
, asyncio.spawn
(or create_task
), and we'll also need to think about asyncio.run_in_executer
.
Quite separately, we really should try to find someone who wants to improve
the docs. Both the reference docs and the tutorial material we have on
docs.python.org are obviously lacking, and my theory is that that has
happened because none of the original asyncio core devs (including myself)
felt like writing them. :-(
I already found such person! @appeltel, who helped me with documenting PEP 525 and 530 is willing to help me with asyncio docs. I'm going to start working with him (and whoever else is interested) when the holidays are over. We'll have a fork of CPython repo on github to work on the docs project.
Here's a counter-question. Why are you using these functions? My guess is that you've got a coroutine that you want to run "in the background", i.e. without waiting for it. What does that coroutine do? Is it a long-running server task? But why not have a "main" task that waits for all your server tasks (using gather() if there could be more than one)? As an added benefit it should be possible to arrange for a cancellation of the main task to cancel all the server tasks automatically, and you could do this in response to a signal (details left as an exercise for the reader :-).
My use case for "spawning" tasks is mainly to run background tasks, such as heartbeat timers and event subscribers taking notifications from external network sources.
For background tasks, I usually keep the reference of the return value of ensure_future
of such tasks and cancel them on termination. For this purpose, I like #465's idea on combining yield
and asyncio.run_forever
to make reference bookkeeping much simpler by using function scopes.
Particularly for timers, I use ensure_future(actual_work)
again (nested) to prevent prolonged timer intervals if the actual work takes some non-negligible time. I don't keep the reference of those inner tasks for simplicity, and use other means to signal termination or give some grace period during shutdown process.
gather
also serves my use cases pretty well, but I "feel" like it's for more homogeneous set of tasks, i.e., doing the same work in parallel for different data or partitioned inputs (like SIMD or MapReduce).
Another use case I can think of is to implement parallelized request-reply handlers in network applications, though usually libraries such as aiozmq.rpc and aiohttp do ensure_future
or create_task
for us. Here we cannot use gather
because request-reply handler tasks must start and finish asynchronously to maximize the system throughput. (i.e., gather
is only useful when we need synchronized termination!)
I think create_task
also has a lot of pedagogical value in addition to whatever productive use cases there may be.
In my own experience trying to follow the docs and learn how asyncio works, the first thing I wanted to do was make a simple "hello world"-like coroutine and run it, ideally in the REPL. Being able to play with aysncio on the REPL and start/stop the event loop as shown below is helpful for a learner to easily experiment with running coroutines:
>>> async def foo(name):
... print(f'{name} is starting')
... await asyncio.sleep(2)
... print(f'{name} is finishing')
...
>>> alice_task = loop.create_task(foo('Alice'))
<Task pending coro=<foo() running at <stdin>:1>>
>>> bob_task = loop.create_task(foo('Bob'))
<Task pending coro=<foo() running at <stdin>:1>>
>>> loop.run_until_complete(asyncio.sleep(5))
Alice is starting
Bob is starting
Alice is finishing
Bob is finishing
Once the learner has a sense of how to schedule tasks, the utility and behavior of wait
and gather
are easier to understand.
Personally, I find the name create_task
to be reasonably understandable, provided that I think of being scheduled as a property that (should be) inherent in a task object. As a consumer of asyncio, the model that I tend to have in my mind is that I create a coroutine object which is ready to run, then I deliver it to the event loop with my_task = loop.create_task(my_coro)
. The object my_task
that I get back I imagine as a sort of receipt or tracking number that I can use to check on or cancel my running coroutine like I would a physical package.
schedule()
seems a bit more natural to me than spawn()
, since in the case of coroutines I have already created the coroutine object - something has already been "spawned" in the sense that an object now exists and ought to be run or I will get a RuntimeWarning. I just need to pass it to some event loop to wrap it in a task and run it for me. With os.spawn*()
nothing has really been allocated before the call, I just pass a string and then something will be created and run.
In any case, explaining to someone the meaning of create_task
, spawn
, or schedule
would be much more natural than ensure_future
. The only thing that would be a bit confusing/surprising is if the implicit and explicit loop commands were different, i.e. if there was only loop.create_task()
and asyncio.spawn()
.
But that example should be written without explicitly creating tasks at all:
loop.run_until_complete(asyncio.gather(foo('Alice'), foo('Bob'), asyncio.sleep(5)))
I suspect that overusing explicit tasks is probably a symptom of classical "thread" thinking, and I'd like us to carefully steer new users away from that paradigm.
Previously @achimnol wrote:
gather
also serves my use cases pretty well, but I "feel" like it's for more homogeneous set of tasks doing the same work in parallel for different data or partitioned inputs (like SIMD or MapReduce.
That's also a dangerous line of thought, since asyncio
does not do "work" in parallel: it does not let you use multiple cores. The event loop is guaranteed to run in a single thread, so even removing the GIL would not change that. As the name implies, asyncio
is for doing I/O concurrently. And gather()
is actually the main primitive for introducing concurrency!
It seems one of the problems we have here is that there are too many ways to introduce concurrency -- we have at least three API layers, represented by callbacks, tasks and coroutines. But callbacks should be avoided if at all possible (they are the low-level machinery used to implement the rest, and to provide interoperability with other frameworks). Coroutines (in the form of async def
and await
) are the API layer of preference. Tasks (while always wrapping a coroutine) sit in between, with gather()
being the most important glue between Tasks and coroutines. (The Future
API is the main bridge between callbacks and Tasks FWIW.)
Of course, realistic networked applications occasionally need things like watchdog tasks that are best implemented using explicit tasks, but these are just a few of the many things that networked applications need in order to be production-ready and robust. There ought to be no need to introduce the lower-level primitives right away in a tutorial.
It's quite possible that once we have an outline for a tutorial ready we'll come up with some tweaks to the high-level API (e.g. the run()
method that's being proposed). But I don't think we should start with API change proposals before we have even figured out the "right" way to write a tutorial.
I suspect that overusing explicit tasks is probably a symptom of classical "thread" thinking, and I'd like us to carefully steer new users away from that paradigm.
I like your approach to introduce concurrent I/O with coroutines as a new paradigm. 👍
That's also a dangerous line of thought, since asyncio does not do "work" in parallel: it does not let you use multiple cores. The event loop is guaranteed to run in a single thread, so even removing the GIL would not change that. As the name implies, asyncio is for doing I/O concurrently. And gather() is actually the main primitive for introducing concurrency!
Yes, I understand that asyncio is not for computational parallelism but concurrent I/O. That's also why I choose to use asyncio because my apps mostly do I/O rather than calculations.
I look forward to see the improved tutorial as well as more intuitive high-level APIs as you mentioned. I can also add my hands if there are something I could help.
Though I agree with @gvanrossum's approach to write better tutorials and guide new users to the coroutine way of thinking including tweaks for synchronous high-level APIs proposed in #465, I'd like to mention that those do not resolve this issue. 🙄 Even occasionally we still need task-creation-followed-by-asynchronously-scheduling-it functionality, and loop.create_task
lacks the sense of asynchronous execution and requires an explicit loop variable everywhere (which now can be avoided technically). I can bear that ensure_future
and its new alias may be discouraged to become a part of beginner's tutorials, but still we need better motivation for the naming.
Thanks for the explanation of the intended design philosophy, @gvanrossum. From the current documentation it is clear that underlying layer of callbacks should be transparent to most users, but I was left with the impression that the average and even beginning user should be concerning themselves with tasks.
And gather() is actually the main primitive for introducing concurrency!
Soasyncio.gather()
returns a Future
, but if I am presenting this in an introduction that doesn't describe Tasks I suppose I should really look at it as just another type of awaitable. This can be awaited on in other coroutines, passed to loop.run_until_complete()
or even passed to asyncio.gather()
. The only basic way to have coroutines run concurrently is to await on them as a group using gather.
Then if this simple API is no longer sufficient, if I need a coroutine to schedule one or more awaitables in some manner other than simply awaiting on it, the concept of a Task can be introduced, along with loop.create_task()
.
If the above is correct then I guess my previous comment was a bit misinformed - I see that there is no real reason to discuss create_task
or ensure_future
at all in a beginner tutorial. To me this resolves the issue as initially stated in a broader sense - to smooth the asyncio learning curve one just doesn't consider Tasks at all except as an advanced topic.
Yes, that's right. When using await
(or yield from
) the difference between coroutines and Future
s (and between the latter and Task
s) can be ignored. It's only when calling .cancel()
that you need to know what you have -- but I believe in a situation where you'd be interested in cancelling you'd naturally have a Future
or Task
already (unless you're writing a framework).
I expect that the trickiest bit to get to happen naturally may actually be termination. I noticed that curio has some sleight of hand around that too (and the tutorial spends a lot of time on this topic).
@achimnol I am fine with adding asyncio.create_task()
. By the time we are ready to talk about it we should have no problem explain what a task is and what its life cycle looks like.
But any name that might suggest that the newly created task automatically starts running is actually dangerous, because it doesn't -- at least not until you await
something. That's a big difference with threading.Thread().start()
.
I realize that schedule()
would be a bit clearer than start()
, but to the first-time reader it still sounds potentially misleading. I want the API description to naturally give the user the right mental model, and that mental model should be that at most one task can be running at any time, and tasks allow other tasks to run by using await
. (Also, a coroutine is always running in the context of a task -- but a single task can be responsible for a whole stack of coroutines, chained via await
. And how this actually works is mostly an advanced topic that you don't need to be aware of until much later.)
I just wanted to add that I was explaining asyncio to a couple of coworkers and I found that they were also confused by the name ensure_future
. I also remember that it kind of bothered me in the beginning and I had to actually read asyncio code to understand what's going on.
Happy new year to everybody (though it's a bit late)!
But any name that might suggest that the newly created task automatically starts running is actually dangerous, because it doesn't -- at least not until you
await
something. That's a big difference withthreading.Thread().start().
@gvanrossum Yes, because the task is a coroutine task, it will have chance to run after the coroutine that created the task yields the control to the event loop (await
).
The main confusion point here around is the naming -- many people (including me) are too familiar with existing multi-threading terms. In this sense, we might need to rethink the name "task" more aggressively -- maybe change all occurrences of "tasks" in user-facing APIs and docs to "coroutines"? Still, create_task
implies (automatic but deferred) execution of the task. If that behavior is specific to coroutines, why not just coroutine instead of task?
Or what about renaming create_task
to queue_coroutine
? As it technically queues it for when someone does either run_forever
or run_until_complete
right?
But any name that might suggest that the newly created task automatically starts running is actually dangerous, because it doesn't -- at least not until you await something. That's a big difference with threading.Thread().start().
Don't forget about people who use daemon threads as well and some people who use them use run_coroutine_threadsafe
to call a coroutine in that thread safely.
@AraHaan I have an objection for that. A queue inside the event loop is an implementation detail; it may not use queue. Something like register_coroutine
would be more generic in your sense.
@gvanrossum To be clear, I can also accept asyncio.create_task
. I'd like to point out that "task" may be too general to express coroutine-specific behaviors. Though, if we cannot find a better alternative, I'm fine with it.
Just for question: I guess the reason that task and coroutine exist separately in asyncio is that a same coroutine may be executed using different event loop schedulers, e.g., curio instead of asyncio. In this sense, a task is a coroutine associated with an event loop scheduler. In other words, coroutine is a language-level term (like another form of functions) and task is a library-level term with a concrete execution context. Am I right?
Tasks and coroutines are different! They should not be confused. A coroutine is simply a function defined with async def
. Coroutines can be used for different frameworks, but it's usually not possible to write a single coroutine that is framework-agnostic (because typically you will use other facilities from the framework, such as Futures or asyncio.sleep()
). Tasks are an asyncio concept representing an object wrapping a coroutine.
Please just stop this debate. We can add a top-level function asyncio.create_task(coro)
which calls asyncio.get_event_loop().create_task(coro)
and we should change the examples in the docs to use that instead of ensure_future()
-- the latter will still exist, with its current (different) semantics, but few people will need to call it.
Thanks for explanation. I too don't have any better idea to asyncio.create_task
now, considering Guido's design philosophy and direction to remedy this issue. +1 to end up here.
This is probably the wrong place to be asking about this -- but is there an analog to the go
statement in the works? Is that what spawn
would do? Because that would be nice... :)