Issues
- 2
Support for moe model?
#156 opened by laoda513 - 1
- 9
- 5
Error attempting to finetune llama2-70b
#139 opened by tensiondriven - 1
- 3
Finetuning CodeLLaMA34B - RuntimeError: The size of tensor a (1024) must match the size of tensor b (8192)
#152 opened by juanps90 - 1
- 2
3 errors detected in the compilation of "src/alpaca_lora_4bit/quant_cuda/quant_cuda_kernel.cu"
#150 opened by kkaarrss - 8
monkeypatch problem
#97 opened by yfliao - 7
- 5
- 2
OOM on inference while i can finetune with more tokens
#146 opened by nepeee - 4
- 8
Unable to Build Wheels
#144 opened by VegaStarlake - 1
Merging LoRA after finetune
#145 opened by gameveloster - 2
Targeting all layers and biases
#141 opened by grimulkan - 8
- 1
Feature request: Stop when loss reaches X
#142 opened by tensiondriven - 1
- 5
- 4
LoRA Output Identical to Base Model
#137 opened by LegendBegins - 1
Flash Attention 2
#138 opened by Jeduh - 2
How to use inference.py after finetune.py?
#136 opened by athenawisdoms - 1
- 4
Gibberish results for non-disabled "faster_mode" using "vicuna-7B-GPTQ-4bit-128g" model
#127 opened by alex4321 - 4
- 2
Crashes during finetuning
#131 opened by gameveloster - 12
Update docs for > 2048 token models (SuperHOT)?
#129 opened by tensiondriven - 3
Differences between QLoRA and this repo
#113 opened by qwopqwop200 - 35
- 1
this repo support 2bit finetuning the llama model? Is there any case to show how to run the scripts?
#122 opened by zlh1992 - 0
[question] weights in the replaced quantized modules
#121 opened by vince62s - 1
how to change into 8 bit
#120 opened by leexinyu1204 - 7
Problem with inference
#119 opened by leexinyu1204 - 2
fine tune with 2 GPU
#118 opened by shawei3000 - 3
Version of GPTQ
#104 opened by juanps90 - 4
how to infer with finetuned model?
#117 opened by balaji-skoruz - 10
- 3
Consider using new QLoRA
#107 opened by juanps90 - 0
Implementing Landmark Attention
#116 opened by juanps90 - 7
Finetuning 2-bit Quantized Models
#115 opened by kuleshov - 1
Code reference request
#112 opened by PanQiWei - 3
Problem loading safetensor file format
#110 opened by ortegaalfredo - 1
what is the difference between v1 model and v2 model?
#111 opened by zlh1992 - 2
Other datasets
#106 opened by Ph0rk0z - 2
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
#105 opened by ra-MANUJ-an - 0
error with monkeypatch and model gpt-j and lora
#103 opened by ReDXeoL - 6
TypeError: '<' not supported between instances of 'tuple' and 'float' while trying to generate completion through the v2 13bit LLAMA
#101 opened by alex4321 - 2
Which script were used for 4bit quantization?
#100 opened by alex4321 - 1
run_server.sh: ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named g_idx.
#98 opened by yfliao