Pinned Repositories
BiLLM
(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
quip-sharp
flute
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Llama3.1-Finetuning
对llama3进行全参微调、lora微调以及qlora微调。
AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
SpQR
LiMa-cas's Repositories
LiMa-cas doesn’t have any repository yet.