Pinned Repositories
aistore
AIStore: scalable storage for AI applications
cuopt
GPU accelerated decision optimization
cuopt-examples
NVIDIA cuOpt examples for decision optimization
DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Megatron-LM
Ongoing research training transformer models at scale
nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUs
nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
NVIDIA Corporation's Repositories
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
NVIDIA/garak
the LLM vulnerability scanner
NVIDIA/NeMo-Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
NVIDIA/Isaac-GR00T
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
NVIDIA/cuda-python
CUDA Python: Performance meets Productivity
NVIDIA/nv-ingest
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
NVIDIA/gpu-operator
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
NVIDIA/cccl
CUDA Core Compute Libraries
NVIDIA/TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
NVIDIA/NeMo-Agent-Toolkit
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
NVIDIA/Q2RTX
NVIDIA’s implementation of RTX ray-tracing in Quake II
NVIDIA/cuda-quantum
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
NVIDIA/NeMo-Skills
A project to improve skills of large language models
NVIDIA/jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
NVIDIA/bionemo-framework
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
NVIDIA/cuopt
GPU accelerated decision optimization
NVIDIA/cuda-checkpoint
CUDA checkpoint and restore utility
NVIDIA/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
NVIDIA/nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmers to perform one-sided communication from within CUDA kernels and on CUDA streams.
NVIDIA/cuEquivariance
cuEquivariance is a math library that is a collective of low-level primitives and tensor ops to accelerate widely-used models, like DiffDock, MACE, Allegro and NEQUIP, based on equivariant neural networks. Also includes kernels for accelerated structure prediction.
NVIDIA/VisRTX
NVIDIA OptiX based implementation of ANARI
NVIDIA/numba-cuda
The CUDA target for Numba
NVIDIA/cuDecomp
An Adaptive Pencil Decomposition Library for NVIDIA GPUs
NVIDIA/topograph
A toolkit for discovering cluster network topology.
NVIDIA/spark-rapids-jni
RAPIDS Accelerator JNI For Apache Spark
NVIDIA/cloud-native-docs
Documentation repository for NVIDIA Cloud Native Technologies
NVIDIA/nvidia-dlfw-inspect
The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.