microsoft/hlsl-specs

[Feature Request] Inclusive prefix sum version of `WavePrefixSum`

Devaniti opened this issue · 1 comments

Is your feature request related to a problem? Please describe.
There are many rendering cases algorithms that involve calculating prefix sums. Sometimes you need exclusive prefix sum, which we already have with WavePrefixSum, but sometimes you need inclusive prefix sum.
We don't currently have direct way to get that.
Additionally, there's sometimes a need to have both, prefix sum and total sum across wave, for example for calculating prefix sum larger than 1 wave. With inclusive prefix sum you can just get that value from the last active thread. Currently, this is normally done with WavePrefixSum followed by WaveActiveSum, with both intrinsics getting same values across all threads.

Describe the solution you'd like
New wave intrinsics:

  • WavePrefixSumInclusive - inclusive prefix sum variant for WavePrefixSum
  • WavePrefixSumExclusive - alias for WavePrefixSum for consistency
  • equivalent new intrinsics for other prefix operations
  • WaveIsLastLane - same as WaveIsFirstLane but for checking if it's the last active lane. Needed to get back total sums.

Optionally, add support for those new intrinsics for older shader models (each one may or may not be added to older SM separately).

  • WavePrefixSumInclusive can be emulated as WavePrefixSumInclusive(value) + value.
  • WavePrefixSumExclusive is an alias for an old intrinsic, so it shouldn't be locked behind new shader model in the first place.
  • Same justification goes for equivalent intrinsics for other prefix operations.
  • WaveIsLastLane can be emulated as WavePrefixCountBits(true) + 1 == WaveActiveCountBits(true), though this one is questionable due to emulation being not as trivial.

Describe alternatives you've considered
I can use WavePrefixSum(value) + value as an alternative, but:

  • It's unclear whether it would work as well as native inclusive prefix sum would.
  • You can't easily check whether current lane is last one without new intrinsic for accessing total sum.
  • You can't easily check whether current lane is last one without new intrinsic for accessing total sum.

I think WaveActiveMax(WaveGetLaneIndex()) == WaveGetLaneIndex() should be true on the last active lane that participates in these operations.

One thing we're considering (independent of shader models) is to start shipping some header-implemented additional functionality for HLSL. These might be good candidates to implement in a header shipped with the compiler and we could consider promoting them to DXIL operations if hardware vendors can identify optimization possibilities that are easier to identify based on DXIL operations.