Issues
- 2
How do I reduce partially filled 2D blocks?
#744 opened by intractabilis - 24
Can't get correct result when use cub in CUDA12.0
#719 opened by YuanRisheng - 2
Possible bug in variable naming
#743 opened by akshit-sharma - 5
BlockLoad never attempts to vectorize
#739 opened by iclementine - 1
- 0
what's the purpose of CUB_SUBSCRIPTION_FACTOR
#732 opened by zhaolianshuizls - 4
Segfault in CachingDeviceAllocator when out of memory
#705 opened by orjgre - 1
Misleading documentation for DeviceSegmentedRadixSort (or I'm using it wrong)
#731 opened by HapeMask - 4
Register-only based `WarpExchange`
#667 opened by pb-dseifert - 6
- 1
What is the correct compile command in Linux platform to compile a function citing cuh?
#729 opened by fengwang - 1
- 1
- 2
Illegal memory access on trying to use `DeviceReduce::Sum()` to count number of non-zeros
#726 opened by alexsamardzic - 1
Documentation of warp-wide collectives refers to `__syncthreads` instead of `__syncwarp`
#677 opened by fkallen - 2
Make decoupled look-back delay part of tuning
#695 opened by gevtushenko - 0
Add policy parameter to allow tuning
#689 opened by gevtushenko - 0
Write example for decoupled look-back API
#699 opened by gevtushenko - 3
Implement tuning db merger
#696 opened by gevtushenko - 3
Unresolved extern function 'cudaLaunchDevice' error while using NVCC 11.x and cub 2.10 with -G
#692 opened by lilohuang - 0
Initial CUB Tuning Infrastructure
#631 opened by gevtushenko - 3
Backport fixes for reordering in CUB member initializer lists into 2.0 branch
#604 opened by gevtushenko - 2
Rewrite remaining tests to use Catch2
#625 opened by gevtushenko - 4
DeviceMemcpy::Batched supports only memory buffers
#672 opened by mfbalin - 2
Can cub::DeviceSegmentedReduce::Reduce support self-defined functor for struct variable instead of just integer?
#666 opened by zlwu92 - 1
Deprecate `cub::Mutex`
#653 opened by gevtushenko - 2
- 12
- 0
Is there any way that I want to remove duplicates in a array but maintaining the original relative order in array using cuda library? That means not sorting.
#655 opened by zlwu92 - 2
DeviceHistogram::HistogramEven and DeviceHistogram::MultiHistogramEven failing for some cases
#619 opened by Beanavil - 1
- 1
Compiler error with CUDA <11.5 due to int128
#629 opened by jrhemstad - 1
Is it possible to radix sort a struct?
#643 opened by alibillalhammoud - 0
why some code use ptx?
#640 opened by MonroeD - 3
Cub can't be compiled on CLANG with CUDA 11.8
#638 opened by ShuaiShao93 - 6
Support reduce and scan for more than 2^31 items
#584 opened by milthorpe - 0
Question regarding block launch order in CUDA
#636 opened by Snektron - 4
SpMV with different matrix and vector types
#541 opened by michaelmigliore - 0
Question: ScanTileState reliance on longlong2 vector type for reading / writing coherently.
#612 opened by IlyaGrebnov - 3
- 1
test/test_device_reduce.cu fails compilation when adding tests for half_f and bfloat16_t.
#615 opened by Beanavil - 0
- 3
Problem with cub::BlockExchange
#599 opened by jjxyai - 3
Segfault when launching kernels
#600 opened by orjgre - 1
- 3
Invalid result from DeviceSegmentedSort::SortPairs/SortKeys when keys are bool type
#594 opened by davidwendt - 6
Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures
#545 opened by elstehle - 1
BlockRadixRankMatch produces invalid results when warp size does not divide block size
#552 opened by Snektron - 3
Use libcu++ atomics in CUB
#560 opened by gevtushenko - 1
thrust cuda kernel launch
#561 opened by hlq1025