jrhemstad

@NVIDIA Lead for CUDA C++ Core Libraries (CCCL) (Thrust/CUB/libcu++). CUDA C++ at the speed-of-(de)light.

@NVIDIAMinneapolis, MN

Pinned Repositories

cuda_arch_odr
Language:Shell4 2 00
cuda_random_memory
Benchmarks for sequential and random memory accesses to global memory
Language:CMake2 2 01
cuda_scalar_result
Answering "What is the faster way to return a single scalar from a kernel to host?"
Language:CMake7 3 01
example_cuda_benchmark
Template repository for CUDA enabled benchmarks using Google Benchmark
Language:CMake7 2 02
two_largest
Adventure in profiling and optimization.
Language:C++7 3 01
thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
Language:C++4.9k 207 778757
cudf
cuDF - GPU DataFrame Library
Language:C++8.6k 154 6.6k919
rmm
RAPIDS Memory Manager
Language:C++523 31 422201

jrhemstad's Repositories

jrhemstad/cuda_scalar_result
Answering "What is the faster way to return a single scalar from a kernel to host?"
Language:CMake7 3 01
jrhemstad/example_cuda_benchmark
Template repository for CUDA enabled benchmarks using Google Benchmark
Language:CMake7 2 02
jrhemstad/two_largest
Adventure in profiling and optimization.
Language:C++7 3 01
jrhemstad/cuda_arch_odr
Language:Shell4 2 00
jrhemstad/nvtx_wrappers
This repository is deprecated and the code has moved to the official NVIDIA NVTX github repository: https://github.com/NVIDIA/NVTX
Language:C++2 2 4
jrhemstad/creduce-example
Examples on how to use C-Reduce to create minimal compiler bug reproducers
Language:Shell1 2 0
jrhemstad/link_test
Testing linkage of function local statics
Language:C++1 2 01
jrhemstad/stdexec
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
Language:C++1 0 0
jrhemstad/.github
0 0
jrhemstad/accelerated-computing-hub
NVIDIA curated collection of educational resources related to general purpose GPU programming.
jrhemstad/cccl
CUDA C++ Core Libraries
Language:C++0 01
jrhemstad/compiler-explorer
Run compilers interactively from your web browser and interact with the assembly
Language:Assembly1 0
jrhemstad/cub
Cooperative primitives for CUDA C++.
Language:Cuda0 0
jrhemstad/cuCollections
Language:C++1 0
jrhemstad/cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Language:C++1 0
jrhemstad/cuda-quantum
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
Language:C++0 0
jrhemstad/cudf
Python GPU DataFrame Library
Language:Cuda1 0
jrhemstad/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0
jrhemstad/devcontainers
Language:Shell0 0
jrhemstad/gil_preload
Add NVTX ranges to Python GIL
Language:C++2 0
jrhemstad/github-markdown
jrhemstad/infra
Infrastructure to set up the public Compiler Explorer instances and compilers
Language:Python1 0
jrhemstad/jrhemstad
1 0
jrhemstad/libcudacxx
The NVIDIA C++ Standard Library
Language:C++1 0
jrhemstad/llm.c
LLM training in simple, raw C/CUDA
jrhemstad/nvbench
CUDA Kernel Benchmarking Library
Language:Cuda1 0
jrhemstad/NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
Language:C1 0
jrhemstad/rmm
RAPIDS Memory Manager
Language:C++3 0
jrhemstad/test_workflow_failure
1 0
jrhemstad/thrust
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.
Language:C++1 0