Pinned Repositories
AutoGrad_CPP
Autograd can automatically differentiate C++ code
cutlass
CUDA Templates for Linear Algebra Subroutines
documents
llvm-doc
MAI
MAI is a neural network inference engine
MemoryArena
MemoryArena used to auto manage memory allocation and memory free.
mvm
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Tools
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
gavinchen430's Repositories
gavinchen430/AutoGrad_CPP
Autograd can automatically differentiate C++ code
gavinchen430/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
gavinchen430/cutlass
CUDA Templates for Linear Algebra Subroutines
gavinchen430/documents
gavinchen430/llvm-doc
gavinchen430/MAI
MAI is a neural network inference engine
gavinchen430/MemoryArena
MemoryArena used to auto manage memory allocation and memory free.
gavinchen430/mvm
gavinchen430/Tools