ROCm/HIP

[Issue]: Asynchronous execution with hipExtModuleLaunchKernel

Closed this issue · 6 comments

Problem Description

It is my understanding that by passing hipExtAnyOrderLaunch as the last argument to this entry point: hipExtModuleLaunchKernel, I could achieve asynchronous execution of the kernels that I'm dispatching.

So I can have a single hipStream_t to which I dispatch my kernels by calling hipExtModuleLaunchKernel, with the above flag for each kernel and they will execute asynchronously, is that correct?

I've been experimenting with it but couldn't achieve this behaviour. I used a single nonBlocking stream but all the kernels I launched with the above entry point were executed synchronously, despite setting the required flag to 1. I inspected that using rocprof and https://ui.perfetto.dev/ as GUI to check if the kernels execute async.

Would you be able to provide me with example of how to use this particular feature to achieve concurrency in a single stream? And how to profile it to see the correct behaviour? Thank you!

Operating System

Ubuntu

CPU

AMD EPYC 7763 64-Core Processor

GPU

AMD Instinct MI210

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Hello @konradkusiak97 , Is it possible to share the sample?

Hello @konradkusiak97 , Can you share device info also? Thanks!

Hi @jaydeeppatel1111, thanks for the reply. I was experimenting with this feature in our unified-runtime project so I don't have an easy reproducible but I can give it a go at making it.

What I'm really only interested in is an example, for instance an existing test which uses several times the hipExtModuleLaunchKernel with hipExtAnyOrderLaunch flag, submitting a kernel to the same hipStream_t. And then checking (for instance in the profiler) that those kernels indeed run asynchronously.

In any case, I'll try to make a reproducible for that. The device info:

  Marketing Name:          AMD EPYC 7763 64-Core Processor
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI210
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-

Let me know if more verbose output from rocminfo would be better.

Hello @konradkusiak97 , Thank you for raising this issue.

After investigation, It is found that the hipExtAnyOrderLaunch is not supported on GFX9XX cards and documentation is updated.

https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___module.html#ga73d0c5f72869e258aa4899a829d9645c

Thanks!

Thanks for following up on this @jtpatel!

This is surprising to me. I thought GFX9XX cards are widely supported in HIP. Are you able to share more details on your findings why this feature doesn't work? Do you have any code example that I could experiment with to observe this feature working on other AMD architectures?

Hello @konradkusiak97 , Sample is to have few kernel's which are dependent with printfs i.e. 2 depends on 1, 3 depends on 2 etc. and launching those kernels on same stream. If you see them executing as serial then it means anyOrder is not honored. You can use multiple streams to have async behavior on GFX9xx. yeah, Looks hipExtAnyOrderLaunch should be honored on other cards i.e. Navi and hence the documentation update has mention for GFX9xx.