ayazhassan/gpuocelot

Add support for CUDA 5 features: dynamic parallelism etc..

Opened this issue · 1 comments

Only a remainder of features added to cuda 5.0 and that would be good to have 
in gpuocelot:
*SM_30 and SM_35 PTX instrinsics support
*Dynamic parallelism
object linking? don't know if that makes sense here..

Original issue reported on code.google.com by rtf...@gmail.com on 17 May 2012 at 1:59

For object linking, currently NVCC has only announced support for CUBIN 
linking.  They will also support PTX linking in the future.  At that time, 
Ocelot should support it with only minor changes, so I plan to wait for that 
feature.  In the meantime, it is possible to 'link' PTX files together by 
simply concatenating them.

Dynamic parallelism should be supported by default on the NVIDIA devices since 
device code contains the kernel launch and interacts directly with the GPU 
driver.  

There is experimental support for asynchronous dynamic parallelism in the 
emulator and LLVM backend via a user-level library that simply calls cudaLaunch 
from a different user-pthread.  We plan to move this functionality into the 
CUDA runtime.  We also need support for synchronous dynamic parallelism, which 
should be relatively easy, but still needs some implementation work.  Of course 
both of these need unit tests.

I'm not sure what the status is on the AMD backend.

Original comment by SolusStu...@gmail.com on 30 May 2012 at 3:57

  • Added labels: Priority-High, Type-Enhancement
  • Removed labels: Priority-Medium, Type-Defect