ZX-ModelCloud

Pinned Repositories

flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.7k 21 160169
GPTQModel
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Language:Python184 3 8931
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python2 0 00
sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python7.1k 63 832653
GPTQModel
Language:Python00
sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Language:Python0 0 00
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python00

ZX-ModelCloud's Repositories

ZX-ModelCloud/GPTQModel
Language:Python00
ZX-ModelCloud/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Language:Python0 0 00
ZX-ModelCloud/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python00