qwopqwop200/GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

PythonApache-2.0

Issues

Syntax changed in triton.testing.do_bench() causing error when running llama_inference.py
#285 opened a year ago by prasanna
1
add support for minicpm
#289 opened 6 months ago by LDLINGLINGLING
0
GPTQ vs bitsandbytes
#288 opened 9 months ago by iaoxuesheng
0
Error when load GPTQ model
#287 opened 10 months ago by KyrieCui
0
Dependency conflicts for `safetensors`
#260 opened 10 months ago by Yiximail
1
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}
#286 opened a year ago by SDcodehub
1
_pickle.UnpicklingError: invalid load key, 'v'.
#249 opened 2 years ago by ahnHeejune
1
inference with the saved model error: AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel'
#271 opened a year ago by LuciaIsFine
2
Porting GPTQ to CPU?
#240 opened 2 years ago by yiliu30
2
the inference speed of GPTQ 4bit quantized model
#252 opened 2 years ago by pineking
2
Support Mistral.
#284 opened a year ago by nbollman
0
error: block with no terminator, has llvm.cond_br %5624, ^bb2, ^bb3
#283 opened a year ago by Hukongtao
0
neox.py needs to add "import math"
#282 opened a year ago by StudyingShao
0
LoRa and diff with bitsandbytes
#281 opened a year ago by RonanKMcGovern
0
Transformers broke again (AttributeError: 'GPTQ' object has no attribute 'inp1')
#280 opened a year ago by EyeDeck
1
Would GPTQ be able to support LLaMa2?
#278 opened a year ago by moonlightian
1
Can i quantize HF version of llama model
#279 opened a year ago by akanyaani
0
Why does the model quantization prompt KILLED at the end?
#277 opened a year ago by g558800
2
Help: Quantized llama-7b model with custom prompt format produces only gibberish
#276 opened a year ago by Glavin001
1
Proposed changes to reduce VRAM usage. Potentially quantize larger models on consumer hardware.
#269 opened a year ago by sigmareaver
3
Issue with GPTQ
#274 opened a year ago by d0lphin
1
High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.
#275 opened a year ago by hyx1999
1
An error is reported when running python setup_cuda.py install
#247 opened 2 years ago by linuxdevopscn
2
can it support openllama model?
#273 opened a year ago by Ted8000
0
Could not obtain official perplexity using bloom_eval()
#272 opened a year ago by xingyueye
0
llama_inference 4bits error
#270 opened a year ago by gjm441
0
AttributeError: 'QuantLinear' object has no attribute 'weight' (t5 branch) (Google/flan-ul2)
#268 opened a year ago by sigmareaver
2
CUDA out of memory on flan-ul2
#265 opened a year ago by sigmareaver
1
[Question] What is the expected discrepancy between simulated and actually computed values?
#261 opened 2 years ago by set-soft
4
The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.7)
#262 opened 2 years ago by siddhsql
2
Sample code does not work
#243 opened 2 years ago by foamliu
2
SqueezeLLM support?
#264 opened 2 years ago by nikshepsvn
0
What is the right perplexity number?
#263 opened 2 years ago by JianbangZ
0
Finetuning Quantized LLaMA
#259 opened 2 years ago by Qifeng-Wu99
0
compare with llama.cpp int4 quantize?
#257 opened 2 years ago by luohao123
0
How to quantize bloom after lora/ptuning?
#255 opened 2 years ago by moonlightian
0
AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'
#239 opened 2 years ago by leszekhanusz
2
I use python llama.py to generate a quantized model, but I can't find the .safetensors model
#254 opened 2 years ago by jimi202008
1
Wondering whether some of the triton or cuda kernel also speedup fp16 or not?
#253 opened 2 years ago by drxmy
0
Errors encountered when running benchmark FP16 baseline on multiple GPUs
#246 opened 2 years ago by foamliu
2
Does this work for gptj specifically the cuda branch? Thanks!
#250 opened 2 years ago by ArEnSc
0
Does not support 3bit quantization?
#248 opened 2 years ago by foamliu
0
No CUDA_ENV / conda-froce cudatoolkit-dev freezes
#245 opened 2 years ago by nathanleclaire
0
Unable to run 'python setup_cuda.py install'
#242 opened 2 years ago by alannoote96
0
Build issue with newer torch pybind11 cast.h - workaround inside
#241 opened 2 years ago by ilikenwf
0
6-bit quantization
#236 opened 2 years ago by philipturner
1
no module named quant_cuda (fastest-inference-4bit branch)
#232 opened 2 years ago by joshlevy89
1
fastest-inference-4bit fails to build
#233 opened 2 years ago by lee-b
3
Giepeto
#234 opened 2 years ago by IsaacGanon
0
Benchmark broken on H100
#231 opened 2 years ago by FrederikAbitz
0