joerick/pyinstrument

Question, missing Async-mode flag at running as module

PabloRuizCuevas opened this issue · 4 comments

Probably this is a feature request more than a question, but I see that the library provides async support, but didn't find if there is a command line flag for it, is not documented, in the code i found:

    # there is no point using async mode for command line invocation,
    # because it will always be capturing the whole program, we never want
    # any execution to be <out-of-context>, and it avoids duplicate
    # profiler errors.

But i don't understand why "there is no point for it". I see it very useful as if you have many async functions at different places of the code disconnected. But if the activation it creates errors then i guess this would be a bug.

Async mode works by limiting profiling to one async context. For example, you could start a profiler in the context of a web request, and then when you await on a database call, thats detected as 'out-of-context', and so the time spent there is attributed to the await.

But when starting from the command line, at the start of any program there is no async context because the program hasn't started any async coroutines yet.

This comment means that setting the async_mode would have no effect, because what that does is 'tag' the contextlib context with the profiler. But this context is duplicated into all subsequent async tasks, so the entire program is considered 'in-context' from the perspective of the profiler. As a result, we can't track awaits.

I can kinda see what you mean about wanting to profile a program from the perspective of many async tasks. I'm not sure if that would be possible. I'd have to hook into the creation of coroutines.
I also haven't figured out how to do multiple threads yet, which might be the first port of call :)

What's your program, out of interest?

hi, thanks for your detailed response, my program just transform and check files, but it is a very long process and it has a couple of services that are ran concurrently, fetching files and gathering async request, usually using asyincio.to_thread() (some libraries i use are not properly built for async operations, so i need to put them in another thread) which may also causes issues, so i got both of them at once.

The rest of the code is synchronous, but I'm experimenting putting more parts of the code into async functions, with the drawback they are obsucured in the pyinstruments output, of course i can analyze them one by one, but i thought it may be nice to have everything at once.

In any case many thanks for the fantastic profiler, I love it :) if there is anything i can help, tell me.

Yeah i think best bet for you would be to figure out which of those services is the slowest (i.e. the rate-determining step), and run pyinstrument from inside the async function/method.

From the perspective of Pyinstrument, aside from figuring out how to get notified of new async contexts, I don't think it's easy to decide which of those contexts you care about (e.g. your service) versus the frames which are stuff you don't (e.g. waiting on file I/O). I might be wrong there, maybe there's a way, but it's not obvious to me.

i think everything is interesting, I would love to see displayed each co-routine in a color when working and another when waiting, but I need to take a look in the code here to understand a bit more all of the things you mention, because I have no idea yet on how feasible it is.

In any way feel free to close the issue.