Features | Examples | Getting Started | Contributing | License | Contact
stdgpu is an open-source library providing generic GPU data structures for fast and reliable data management.
- Lightweight C++17 library with minimal dependencies
- CUDA, OpenMP, and HIP (experimental) backends
- Familiar STL-like GPU containers
- High-level, agnostic container functions like
insert(begin, end)
, to write shared C++ code - Low-level, native container functions like
find(key)
, to write custom CUDA kernels, etc. - Interoperability with thrust GPU algorithms
Instead of providing yet another ecosystem, stdgpu is designed to be a lightweight container library. Previous libraries such as thrust, VexCL, ArrayFire or Boost.Compute focus on the fast and efficient implementation of various algorithms and only operate on contiguously stored data. stdgpu follows an orthogonal approach and focuses on fast and reliable data management to enable the rapid development of more general and flexible GPU algorithms just like their CPU counterparts.
At its heart, stdgpu offers the following GPU data structures and containers:
atomic & atomic_ref Atomic primitive types and references |
bitset Space-efficient bit array |
deque Dynamically sized double-ended queue |
queue & stack Container adapters |
unordered_map & unordered_set Hashed collection of unique keys and key-value pairs |
vector Dynamically sized contiguous array |
In addition, stdgpu also provides further commonly used helper functionality in algorithm
, bit
, contract
, cstddef
, execution
, functional
, iterator
, limits
, memory
, mutex
, numeric
, ranges
, type_traits
, utility
.
In order to reliably perform complex tasks on the GPU, stdgpu offers flexible interfaces that can be used in both agnostic code, e.g. via the algorithms provided by thrust, as well as in native code, e.g. in custom CUDA kernels.
For instance, stdgpu is extensively used in SLAMCast, a scalable live telepresence system, to implement real-time, large-scale 3D scene reconstruction as well as real-time 3D data streaming between a server and an arbitrary number of remote clients.
Agnostic code. In the context of SLAMCast, a simple task is the integration of a range of updated blocks into the duplicate-free set of queued blocks for data streaming which can be expressed very conveniently:
#include <stdgpu/cstddef.h> // stdgpu::index_t
#include <stdgpu/iterator.h> // stdgpu::make_device
#include <stdgpu/unordered_set.cuh> // stdgpu::unordered_set
class stream_set
{
public:
void
add_blocks(const short3* blocks,
const stdgpu::index_t n)
{
set.insert(stdgpu::make_device(blocks),
stdgpu::make_device(blocks + n));
}
// Further functions
private:
stdgpu::unordered_set<short3> set;
// Further members
};
Native code. More complex operations such as the creation of the duplicate-free set of updated blocks or other algorithms can be implemented natively, e.g. in custom CUDA kernels with stdgpu's CUDA backend enabled:
#include <stdgpu/cstddef.h> // stdgpu::index_t
#include <stdgpu/unordered_map.cuh> // stdgpu::unordered_map
#include <stdgpu/unordered_set.cuh> // stdgpu::unordered_set
__global__ void
compute_update_set(const short3* blocks,
const stdgpu::index_t n,
const stdgpu::unordered_map<short3, voxel*> tsdf_block_map,
stdgpu::unordered_set<short3> mc_update_set)
{
// Global thread index
stdgpu::index_t i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
short3 b_i = blocks[i];
// Neighboring candidate blocks for the update
short3 mc_blocks[8]
= {
short3(b_i.x - 0, b_i.y - 0, b_i.z - 0),
short3(b_i.x - 1, b_i.y - 0, b_i.z - 0),
short3(b_i.x - 0, b_i.y - 1, b_i.z - 0),
short3(b_i.x - 0, b_i.y - 0, b_i.z - 1),
short3(b_i.x - 1, b_i.y - 1, b_i.z - 0),
short3(b_i.x - 1, b_i.y - 0, b_i.z - 1),
short3(b_i.x - 0, b_i.y - 1, b_i.z - 1),
short3(b_i.x - 1, b_i.y - 1, b_i.z - 1),
};
for (stdgpu::index_t j = 0; j < 8; ++j)
{
// Only consider existing neighbors
if (tsdf_block_map.contains(mc_blocks[j]))
{
mc_update_set.insert(mc_blocks[j]);
}
}
}
More examples can be found in the examples
directory.
stdgpu requires a C++17 compiler as well as minimal backend dependencies and can be easily built and integrated into your project via CMake:
More guidelines as well as a comprehensive introduction into the design and API of stdgpu can be found in the documentation.
For detailed information on how to contribute, see the Contributing section in the documentation.
Distributed under the Apache 2.0 License. See LICENSE
for more information.
If you use stdgpu in one of your projects, please cite the following publications:
stdgpu: Efficient STL-like Data Structures on the GPU
@UNPUBLISHED{stotko2019stdgpu,
author = {Stotko, P.},
title = {{stdgpu: Efficient STL-like Data Structures on the GPU}},
year = {2019},
month = aug,
note = {arXiv:1908.05936},
url = {https://arxiv.org/abs/1908.05936}
}
@article{stotko2019slamcast,
author = {Stotko, P. and Krumpen, S. and Hullin, M. B. and Weinmann, M. and Klein, R.},
title = {{SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence}},
journal = {IEEE Transactions on Visualization and Computer Graphics},
volume = {25},
number = {5},
pages = {2102--2112},
year = {2019},
month = may
}
Patrick Stotko - stotko@cs.uni-bonn.de