oroOccupancyMaxActiveBlocksPerMultiprocessor returns hipErrorInvalidDeviceFunction always
AtsushiYoshimura0302 opened this issue · 7 comments
Hi, the API oroOccupancyMaxActiveBlocksPerMultiprocessor always fails. I tried the "main" branch and also "release/hip6.0_cuda12.2" with RX7900XTX and RTX4090 but it always failed.
This can be reproduced by just adding this to "SimpleDemo".
int numBlocks = 0;
oroError_t error = oroOccupancyMaxActiveBlocksPerMultiprocessor( &numBlocks, function, 128, 0 );
printf( "occupancy api %d %d\n", error, numBlocks ); // shows occupancy api 98 0
can anyone help?
The function is correctly bound to HIP ( hipOccupancyMaxActiveBlocksPerMultiprocessor ) so I don't think the bug is related to Orochi.
I confirm I'm reproducing the same error code: hipErrorInvalidDeviceFunction (98).
@RichardGe
Thank you for checking but I found out the reason.
Orochi API have to use hipModuleOccupancyMaxActiveBlocksPerMultiprocessor/cuOccupancyMaxActiveBlocksPerMultiprocessor instead of hipOccupancyMaxActiveBlocksPerMultiprocessor / cudaOccupancyMaxActiveBlocksPerMultiprocessor since orochi uses runtime compilation. There is a difference in the pointer treatment between driver API and runtime API. The current binding is for runtime API.
I think there are some more incorrect bindings e.g. oroFuncGetAttributes()
I confirmed the behavior with HIP SDK6.1 and https://github.com/ROCm/rocm-examples.git (92786e2 - Add source format linting to the GitHub workflows (#140)) and https://github.com/NVIDIA/cuda-samples
Some clarification here:
The selection of these two functions depends on where the function pointer, which is used as one of the params, is coming from.
For example, in CUDA case:
If the function pointer is originally from something like cuModuleGetFunction() , the rest should be bound to "cu" instead of "cuda" and we cannot mix them.
Note: In CUDA, runtime API functions start with "cuda" and driver API functions start with "cu"
The same applies to HIP.
Hi @AtsushiYoshimura0302 we investigated with @KaoCC ,
in SimpleDemo
, the function is taken from oroModuleGetFunction
, so you need to use oroModuleOccupancyMaxActiveBlocksPerMultiprocessor
instead of oroOccupancyMaxActiveBlocksPerMultiprocessor
.
I tested, it worked.
So, I think we can close this ticket.
ah, thanks you for finding it out and checking.