CallmeZhangChenchen

Pinned Repositories

ChatTTS
A generative speech model for daily dialogue.
Language:Python31.1k 179 5153.4k
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Language:Python671 9 13293
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.4k 31 454465
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python00
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python1.2k 18 12270
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language:C++10.6k 157 3.7k2.1k
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.3k 88 1.8k920
Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Language:Shell7.5k 42 772460
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python8.1k 139 3.7k1.5k

CallmeZhangChenchen/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python00