Pinned Repositories
flashinfer
FlashInfer: Kernel Library for LLM Serving
GPTQModel
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
sglang
SGLang is a fast serving framework for large language models and vision language models.
GPTQModel
sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
ZX-ModelCloud's Repositories
ZX-ModelCloud/GPTQModel
ZX-ModelCloud/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
ZX-ModelCloud/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs