ModelTC/llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

PythonApache-2.0

Issues

TesseraQ 2-bit Inference for Qwen2-VL-7B-Instruct
#247 opened a month ago by Oehri-Sven
1
BUG: Mixed-precision configuration not working with STATIC quantization
#163 opened 2 months ago by sasha-hailo
10
量化internlm2-chat-1_8b后，使用vllm推理时报错
#221 opened a month ago by baisesj
1
How to Quantize with SpinQuant and Export to VLLM
#175 opened 2 months ago by TweedBeetle
6
Discord link brroken
#176 opened 2 months ago by TweedBeetle
1
KV cache / post-RoPE rotation & quantization in QuaRot
#148 opened 3 months ago by sasha-hailo
5
chatglm series model support.
#166 opened 2 months ago by simplew2011
1
PPL results for AWQ is not correct?
#161 opened 2 months ago by yc2367
2
fail to run awq on qwen2-7B
#158 opened 2 months ago by Muuut
2
llmc可以支持smoothqaunt的w8a8在trt-llm后端推理吗？
#136 opened 2 months ago by GuangyanZhang
2
可以用于部署在高通芯片吗?
#123 opened 2 months ago by xieyi4650
1
lightllm部署推理问题
#65 opened 2 months ago by lzd19981105
1
fail to start awq quantized model with lightllm on qwen2-7b-instruct
#56 opened 2 months ago by gloritygithub11
10
Mixtral 8x7b failed on compile with tensorrt-llm
#22 opened 2 months ago by gloritygithub11
2
LLama3-8B-Instruct fail for TensorRT-LLM
#21 opened 2 months ago by gloritygithub11
2
Is awq_w_only.yml & awq_w4a16.yml use the same source code?
#116 opened 3 months ago by LiMa-cas
2
Inference per layer mode has no support for Llama model
#115 opened 3 months ago by ZipXuan
2
failed to save quantizationed model
#97 opened 3 months ago by LiMa-cas
17
Possible Bug in QuaRot Implementation with remove_mean_from_embed()
#92 opened 4 months ago by A-suozhang
4
where is run_awq_llama.sh
#94 opened 4 months ago by LiMa-cas
1
fail to run awq+omniquart on qwen2-7b
#57 opened 4 months ago by gloritygithub11
5
Wanda OOM to be fixed
#73 opened 4 months ago by guanchenl
2
fail to run awq on qwen2-7b-instruct
#55 opened 4 months ago by gloritygithub11
2
fail to run awq on Qwen2-7B-Instruct
#40 opened 4 months ago by gloritygithub11
3
Fail to test the generate api for smoothquant on qwen2-7b with lightllm
#41 opened 4 months ago by gloritygithub11
2
fail to run smoothquant on llama3-8b
#39 opened 4 months ago by gloritygithub11
3
wanda bug
#35 opened 4 months ago by guanchenl
2
llama3.1-70b awq_w4a4 error
#25 opened 5 months ago by lihongqiang
2
使用wanda对Llama-2-7b-hf模型进行压缩，为什么模型大小没有变化？
#30 opened 5 months ago by ChengShuting
1
raising exception: Cuda out of memory when quantilizing Mistral-large-2 (123B), using export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 on a H100
#28 opened 5 months ago by BinFuPKU
3
calib config bs=-1:RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 134 but got size 512 for tensor number 1 in the list
#6 opened 7 months ago by Worromots
1
How to use Tensorrt-LLM as backend
#5 opened 7 months ago by Worromots
4
Can I use this library for model quantization with only Pytorch+CPU？
#1 opened 9 months ago by lemon-little
3