RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40
dipanjanS opened this issue · 5 comments
System Info
transformers-4.40.1
peft-0.10.0
accelerate-0.30.0
bitsandbytes-0.43.1
evaluate-0.4.2
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
This is the colab notebook for a simple fine-tuning of a DistilBERT model using QLoRA
The main code snippet of interest which is erroring out:
model_checkpoint = "distilbert/distilbert-base-uncased"
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig
config = BitsAndBytesConfig(
load_in_4bit=True, # quantize the model to 4-bits when you load it
bnb_4bit_quant_type="nf4", # use a special 4-bit data type for weights initialized from a normal distribution
bnb_4bit_use_double_quant=True, # nested quantization scheme to quantize the already quantized weights
bnb_4bit_compute_dtype=torch.bfloat16, # use bfloat16 for faster computation,
)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint,
id2label=id2label,
label2id=label2id,
num_labels=2,
quantization_config=config)
from peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq
config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_CLS)
peft_model = get_peft_model(model, config)
replace_lora_weights_loftq(peft_model)
print_trainable_parameters(peft_model)
Error happens in the line peft_model = get_peft_model(model, config)
above when the PEFT model is being made. Error trace is as follows.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-41-79fb3abcb23e>](https://localhost:8080/#) in <cell line: 19>()
17 model.get_input_embeddings().register_forward_hook(make_inputs_require_grad)
18
---> 19 peft_model = get_peft_model(model, config)
20 replace_lora_weights_loftq(peft_model)
21 print_trainable_parameters(peft_model)
5 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in requires_grad_(self, requires_grad)
2433 """
2434 for p in self.parameters():
-> 2435 p.requires_grad_(requires_grad)
2436 return self
2437
RuntimeError: only Tensors of floating point dtype can require gradients
Expected behavior
Ideally the model should get created and then fine-tuned. The same notebook used to work fine with transformers==4.38
but something might have changed as it is no longer working with transformers==4.40
, have validated the same that when I downgrade this code still works. I want some help in figuring out if something is wrong in the code which I need to change \ fundamentally doing wrong or there is a deeper issue.
Yes, I can reproduce the error. The reason is that since transformers==4.40
, the pre_classifier
module of this model is converted to a bitsandbytes Linear4bit
when instead of being a normal PyTorch nn.Linear
. As this module as being added to the modules_to_save
, PEFT tries to enable gradients on it, resulting in the error you see. We'll discuss this internally and think of an appropriate fix. In the meantime, if possible, downgrade to an earlier transformers version.
Hi @dipanjanS !
Thanks for the issue, I had a deeper look. Previously there was a silent bug in transformers that was quantizing the pre_classifier
layer, which shouldn't happen as only the last layer should be quantized.
huggingface/transformers#29958 introduced a fix for that and introduced this bug you are sharing which shouldn't be really a bug since the pre_classifier
should be quantized at first place, only the last layer shouldn't be quantized.
To temporary fix your issue, can you load the 4-bit model with: llm_int8_skip_modules=["classifier", "pre_classifier"]
?
model_checkpoint = "distilbert/distilbert-base-uncased"
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig
config = BitsAndBytesConfig(
load_in_4bit=True, # quantize the model to 4-bits when you load it
bnb_4bit_quant_type="nf4", # use a special 4-bit data type for weights initialized from a normal distribution
bnb_4bit_use_double_quant=True, # nested quantization scheme to quantize the already quantized weights
bnb_4bit_compute_dtype=torch.bfloat16, # use bfloat16 for faster computation,
+ llm_int8_skip_modules=["classifier", "pre_classifier"]
)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint,
id2label=id2label,
label2id=label2id,
num_labels=2,
quantization_config=config)
from peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq
config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_CLS)
peft_model = get_peft_model(model, config)
replace_lora_weights_loftq(peft_model)
print_trainable_parameters(peft_model)
Hi @dipanjanS
Thanks ! Yes, since there was a bug on previous transformers version, we will not automatically handle this in a future transformers release