Issues
- 1
TesseraQ 2-bit Inference for Qwen2-VL-7B-Instruct
#247 opened by Oehri-Sven - 10
- 1
量化internlm2-chat-1_8b后,使用vllm推理时报错
#221 opened by baisesj - 6
How to Quantize with SpinQuant and Export to VLLM
#175 opened by TweedBeetle - 1
Discord link brroken
#176 opened by TweedBeetle - 5
- 1
chatglm series model support.
#166 opened by simplew2011 - 2
PPL results for AWQ is not correct?
#161 opened by yc2367 - 2
fail to run awq on qwen2-7B
#158 opened by Muuut - 2
llmc可以支持smoothqaunt的w8a8在trt-llm后端推理吗?
#136 opened by GuangyanZhang - 1
可以用于部署在高通芯片吗?
#123 opened by xieyi4650 - 1
lightllm部署推理问题
#65 opened by lzd19981105 - 10
- 2
- 2
LLama3-8B-Instruct fail for TensorRT-LLM
#21 opened by gloritygithub11 - 2
- 2
- 17
failed to save quantizationed model
#97 opened by LiMa-cas - 4
- 1
where is run_awq_llama.sh
#94 opened by LiMa-cas - 5
fail to run awq+omniquart on qwen2-7b
#57 opened by gloritygithub11 - 2
Wanda OOM to be fixed
#73 opened by guanchenl - 2
fail to run awq on qwen2-7b-instruct
#55 opened by gloritygithub11 - 3
fail to run awq on Qwen2-7B-Instruct
#40 opened by gloritygithub11 - 2
Fail to test the generate api for smoothquant on qwen2-7b with lightllm
#41 opened by gloritygithub11 - 3
fail to run smoothquant on llama3-8b
#39 opened by gloritygithub11 - 2
- 2
llama3.1-70b awq_w4a4 error
#25 opened by lihongqiang - 1
使用wanda对Llama-2-7b-hf模型进行压缩,为什么模型大小没有变化?
#30 opened by ChengShuting - 3
raising exception: Cuda out of memory when quantilizing Mistral-large-2 (123B), using export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 on a H100
#28 opened by BinFuPKU - 1
calib config bs=-1:RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 134 but got size 512 for tensor number 1 in the list
#6 opened by Worromots - 4
How to use Tensorrt-LLM as backend
#5 opened by Worromots - 3