[Issue]: Asynchronous execution with hipExtModuleLaunchKernel
Closed this issue · 6 comments
Problem Description
It is my understanding that by passing hipExtAnyOrderLaunch
as the last argument to this entry point: hipExtModuleLaunchKernel, I could achieve asynchronous execution of the kernels that I'm dispatching.
So I can have a single hipStream_t
to which I dispatch my kernels by calling hipExtModuleLaunchKernel
, with the above flag for each kernel and they will execute asynchronously, is that correct?
I've been experimenting with it but couldn't achieve this behaviour. I used a single nonBlocking
stream but all the kernels I launched with the above entry point were executed synchronously, despite setting the required flag to 1
. I inspected that using rocprof
and https://ui.perfetto.dev/ as GUI to check if the kernels execute async.
Would you be able to provide me with example of how to use this particular feature to achieve concurrency in a single stream? And how to profile it to see the correct behaviour? Thank you!
Operating System
Ubuntu
CPU
AMD EPYC 7763 64-Core Processor
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Hello @konradkusiak97 , Is it possible to share the sample?
Hello @konradkusiak97 , Can you share device info also? Thanks!
Hi @jaydeeppatel1111, thanks for the reply. I was experimenting with this feature in our unified-runtime project so I don't have an easy reproducible but I can give it a go at making it.
What I'm really only interested in is an example, for instance an existing test which uses several times the hipExtModuleLaunchKernel
with hipExtAnyOrderLaunch
flag, submitting a kernel to the same hipStream_t
. And then checking (for instance in the profiler) that those kernels indeed run asynchronously.
In any case, I'll try to make a reproducible for that. The device info:
Marketing Name: AMD EPYC 7763 64-Core Processor
Name: gfx90a
Marketing Name: AMD Instinct MI210
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Let me know if more verbose output from rocminfo
would be better.
Hello @konradkusiak97 , Thank you for raising this issue.
After investigation, It is found that the hipExtAnyOrderLaunch is not supported on GFX9XX cards and documentation is updated.
Thanks!
Thanks for following up on this @jtpatel!
This is surprising to me. I thought GFX9XX cards are widely supported in HIP. Are you able to share more details on your findings why this feature doesn't work? Do you have any code example that I could experiment with to observe this feature working on other AMD architectures?
Hello @konradkusiak97 , Sample is to have few kernel's which are dependent with printfs i.e. 2 depends on 1, 3 depends on 2 etc. and launching those kernels on same stream. If you see them executing as serial then it means anyOrder is not honored. You can use multiple streams to have async behavior on GFX9xx. yeah, Looks hipExtAnyOrderLaunch should be honored on other cards i.e. Navi and hence the documentation update has mention for GFX9xx.