jrhemstad
@NVIDIA Lead for CUDA C++ Core Libraries (CCCL) (Thrust/CUB/libcu++). CUDA C++ at the speed-of-(de)light.
@NVIDIAMinneapolis, MN
Pinned Repositories
cuda_arch_odr
cuda_random_memory
Benchmarks for sequential and random memory accesses to global memory
cuda_scalar_result
Answering "What is the faster way to return a single scalar from a kernel to host?"
example_cuda_benchmark
Template repository for CUDA enabled benchmarks using Google Benchmark
two_largest
Adventure in profiling and optimization.
thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
cudf
cuDF - GPU DataFrame Library
rmm
RAPIDS Memory Manager
jrhemstad's Repositories
jrhemstad/cuda_scalar_result
Answering "What is the faster way to return a single scalar from a kernel to host?"
jrhemstad/example_cuda_benchmark
Template repository for CUDA enabled benchmarks using Google Benchmark
jrhemstad/two_largest
Adventure in profiling and optimization.
jrhemstad/cuda_arch_odr
jrhemstad/nvtx_wrappers
This repository is deprecated and the code has moved to the official NVIDIA NVTX github repository: https://github.com/NVIDIA/NVTX
jrhemstad/creduce-example
Examples on how to use C-Reduce to create minimal compiler bug reproducers
jrhemstad/link_test
Testing linkage of function local statics
jrhemstad/stdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
jrhemstad/.github
jrhemstad/accelerated-computing-hub
NVIDIA curated collection of educational resources related to general purpose GPU programming.
jrhemstad/cccl
CUDA C++ Core Libraries
jrhemstad/compiler-explorer
Run compilers interactively from your web browser and interact with the assembly
jrhemstad/cub
Cooperative primitives for CUDA C++.
jrhemstad/cuCollections
jrhemstad/cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
jrhemstad/cuda-quantum
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
jrhemstad/cudf
Python GPU DataFrame Library
jrhemstad/cutlass
CUDA Templates for Linear Algebra Subroutines
jrhemstad/devcontainers
jrhemstad/gil_preload
Add NVTX ranges to Python GIL
jrhemstad/github-markdown
jrhemstad/infra
Infrastructure to set up the public Compiler Explorer instances and compilers
jrhemstad/jrhemstad
jrhemstad/libcudacxx
The NVIDIA C++ Standard Library
jrhemstad/llm.c
LLM training in simple, raw C/CUDA
jrhemstad/nvbench
CUDA Kernel Benchmarking Library
jrhemstad/NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
jrhemstad/rmm
RAPIDS Memory Manager
jrhemstad/test_workflow_failure
jrhemstad/thrust
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.