geqian-9192

Pinned Repositories

PrefixQuant
An algorithm for static activation quantization of LLMs
Language:Python98 5 204
deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
Language:Python278 9 3022
qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python469 8 3728

geqian-9192's Repositories

geqian-9192/llm-action
本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）
geqian-9192/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".