Pinned Repositories
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
cutlass
CUDA Templates for Linear Algebra Subroutines
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
archbase
教科书《计算机体系结构基础》(胡伟武等,第三版)的开源版本
MNN
MNN is a lightweight deep neural network inference engine.
reinforcement-learning-an-introduction
Python code for Reinforcement Learning: An Introduction
yyfcc17's Repositories
yyfcc17/archbase
教科书《计算机体系结构基础》(胡伟武等,第三版)的开源版本
yyfcc17/MNN
MNN is a lightweight deep neural network inference engine.
yyfcc17/reinforcement-learning-an-introduction
Python code for Reinforcement Learning: An Introduction