Add support for CUDA 5 features: dynamic parallelism etc..
Opened this issue · 1 comments
GoogleCodeExporter commented
Only a remainder of features added to cuda 5.0 and that would be good to have
in gpuocelot:
*SM_30 and SM_35 PTX instrinsics support
*Dynamic parallelism
object linking? don't know if that makes sense here..
Original issue reported on code.google.com by rtf...@gmail.com
on 17 May 2012 at 1:59
GoogleCodeExporter commented
For object linking, currently NVCC has only announced support for CUBIN
linking. They will also support PTX linking in the future. At that time,
Ocelot should support it with only minor changes, so I plan to wait for that
feature. In the meantime, it is possible to 'link' PTX files together by
simply concatenating them.
Dynamic parallelism should be supported by default on the NVIDIA devices since
device code contains the kernel launch and interacts directly with the GPU
driver.
There is experimental support for asynchronous dynamic parallelism in the
emulator and LLVM backend via a user-level library that simply calls cudaLaunch
from a different user-pthread. We plan to move this functionality into the
CUDA runtime. We also need support for synchronous dynamic parallelism, which
should be relatively easy, but still needs some implementation work. Of course
both of these need unit tests.
I'm not sure what the status is on the AMD backend.
Original comment by SolusStu...@gmail.com
on 30 May 2012 at 3:57
- Added labels: Priority-High, Type-Enhancement
- Removed labels: Priority-Medium, Type-Defect