combined inner outer reduction used in layer norm backward
Opened this issue ยท 0 comments
liqiangxl commented
๐ The feature, motivation and pitch
combine inner and outer reduction into one kernel.
- do partial outer reduction while blocks are looping over outer domain doing block inner reduction.
- write result of partial outer reduction to gmem
- sync and reload from gmem
- remap parallel pattern to finalized outer reduciton.
used in ln_backward.
Alternatives
No response
Additional context
No response