Pinned Repositories
PrefixQuant
An algorithm for static activation quantization of LLMs
deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
geqian-9192's Repositories
geqian-9192/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
geqian-9192/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".