microsoft/nnfusion

[ENHANCEMENT] Active block check in -fblockfusion_level=2

xysmlx opened this issue ยท 1 comments

๐Ÿš€ Feature
Check GridDim in -fblockfusion_level=2 to satisfy the active block limitation in CUDA.

Motivation
BlockFusion with -fblockfusion_level=2 uses inter-block synchronization primitives. Improper number of BEs (vEUs) may lead to deadlock due to the active block limitation in CUDA.

Pitch
We can use nvcc to check the GridDim after blockfusion codegen and adaptively change the number of BEs (vEUs) to satisfy the active block limitation in CUDA.

Alternatives
Fallback to -fblockfusion_level=1 when the GridDim exceeds the active block limitation. The overhead of inter-block synchronization is becoming larger with the increasing of blocks.

Additional context

Thanks for the report @xysmlx! I will look into it ASAP! (I'm a bot).