[ENHANCEMENT] Active block check in -fblockfusion_level=2
xysmlx opened this issue ยท 1 comments
xysmlx commented
๐ Feature
Check GridDim in -fblockfusion_level=2 to satisfy the active block limitation in CUDA.
Motivation
BlockFusion with -fblockfusion_level=2 uses inter-block synchronization primitives. Improper number of BEs (vEUs) may lead to deadlock due to the active block limitation in CUDA.
Pitch
We can use nvcc to check the GridDim after blockfusion codegen and adaptively change the number of BEs (vEUs) to satisfy the active block limitation in CUDA.
Alternatives
Fallback to -fblockfusion_level=1 when the GridDim exceeds the active block limitation. The overhead of inter-block synchronization is becoming larger with the increasing of blocks.
Additional context