Pinned Repositories
ChatTTS
A generative speech model for daily dialogue.
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
CallmeZhangChenchen's Repositories
CallmeZhangChenchen/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.