How do I reduce partially filled 2D blocks?

Question

How do I reduce partially filled 2D blocks?

intractabilis opened this issue a year ago · 2 comments

I need to reduce a cudaTextureObject_t. I covered it with 16x16 blocks and cannot find in the documentation how to use a block reduction with edge blocks that only partially cover the texture. For 1D blocks, there is a parameter num_valid, which I am unsure how to use with a 2D block.

template <int blockX, int blockY, BlockReduceAlgorithm alg>
__global__
void cudaMinMaxAmplitude(cudaTextureObject_t tex, int width, int height, int *min) {
    using BlockReduceT = BlockReduce<int, blockX, alg, blockY>;
    __shared__ typename BlockReduceT::TempStorage minStorage;

    const auto col = blockIdx.x * blockDim.x + threadIdx.x;
    const auto row = blockIdx.y * blockDim.y + threadIdx.y;

    const auto [a, b, g, r] = tex2D<uchar4>(tex, col, row);
    const auto amp = int(g);

    if (width <= col || height <= row) {
        // ?????
    }

    auto blockMin = BlockReduceT(minStorage).Reduce(amp, cub::Min());

    if (0 == threadIdx.x && 0 == threadIdx.y) {
        atomicMin(min, blockMin);
    }
}

Answer 1 · 2023-09-29T16:21:19.000Z

Hello @intractabilis!

CUB docs state that:

The first num_valid threads each contribute one input element.
For multi-dimensional blocks, threads are linearly ranked in row-major order.

This means, that for 2D blocks, num_valid would require valid items to be compacted in the first rows. If I understand your case correctly, you'd need the num_valid to be something like dim3 instead of an int. I'm not sure if we have the capacity to provide this overload right now.

Can you consider filling invalid data with identity element? For your case it'd be something along the following lines:

#include <cuda/std/limits>
// ...
const auto amp = is_valid() ? int(g) : cuda::std::numeric_limits<int>::max()

Answer 2 · 2023-09-29T17:19:09.000Z

I got it. Thank you, Georgy, it will work.