Pinned Repositories
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
ChatGPT-Next-Web
One-Click to deploy well-designed ChatGPT web UI on Vercel. 一键拥有你自己的 ChatGPT 网页服务。
Easy-Translate
Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
PiPPy
Pipeline Parallelism for PyTorch
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
laoda513's Repositories
laoda513/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
laoda513/ChatGPT-Next-Web
One-Click to deploy well-designed ChatGPT web UI on Vercel. 一键拥有你自己的 ChatGPT 网页服务。
laoda513/Easy-Translate
Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.
laoda513/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs