Pinned Repositories
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
GPTQModel
sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
GPTQModel
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
LRL-ModelCloud's Repositories
LRL-ModelCloud/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
LRL-ModelCloud/GPTQModel
LRL-ModelCloud/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
LRL-ModelCloud/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs