How to adapt QLoRA to other base models?

Question

How to adapt QLoRA to other base models?

Closed this issue a year ago · 4 comments

Thank you very much for your work! Additionally, I would like to know, which part of the code should I refer to in order to make the base model already open-sourced on Huggingface support QLoRA? I'm a bit confused, your guidance would be greatly appreciated!

Answer 1 · 2023-06-17T06:20:30.000Z

Yes, there are many models open-sourced on Huggingface support QLoRA, such as OPT, 'LLAMA', 'BLOOM', 'GPT-NeoX'...

Answer 2 · 2023-06-17T13:18:28.000Z

That's remarkable! I do have another question though. Are your code adaptable with other base models? Specifically, there's a base model on Huggingface that I'm interested in, but it's not currently compatible with QLoRa. Would it be possible to retrofit this specific model with QLoRa, using the guidelines from your repo?

Answer 3 · 2023-06-18T10:11:40.000Z

Yes, it is definitely possible to retrofit a different base model with QLoRa, using the guidelines from our repository. The process involves modifying the code that loads and converts the base model's weights into the QLoRa format, and then training the resulting QLoRa-compatible model on your data.

Answer 4 · 2023-06-20T04:00:50.000Z

Based on the goals I've mentioned earlier, which file should I refer to and modify? Is it qlora_finetune.py or qlora_int4_finetune.py? Because I noticed that in finetune-baichuan-7b.sh, qlora_finetune.py is being used directly.

Also, regarding the adaptation for other models, I've been encountering OOM errors on a 3090-24GB setup with the CpmBee-1B model. I'm not sure why this is happening. So, what should I specifically look out for in your code when it comes to using them to adapt QLoRA to other base models? maybe in get_accelerate_model?
The finetune process in CpmBee is like the following, I am not sure how to make it in your code:

from transformers import AutoTokenizer, AutoModelForCausalLM
from accelerate import Accelerator
from torch.utils.data import Dataset, DataLoader

accelerator = Accelerator()

trainset = Dataset()  # Make sure trainset.__getitem__() can get data with correct format like {"input": "...", "<ans>": ""}
# for details, you can read https://github.com/OpenBMB/CPM-Bee/tree/main/tutorials/basic_task_finetune
train_loader = DataLoader(trainset, batch_size=1)

tokenizer = AutoTokenizer.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True).cuda()

optimizer = torch.optim.Adam(model.parameters())

model, optimizer, train_loader = accelerator.prepare(
    model, optimizer, train_loader
)

for iter, data in enumerate(train_loader):
    optimizer.zero_grad()

    # change the data to a trainable format
    input_encoded = tokenizer.prepare_for_finetune(data, max_length=512).to(model.device)

    outputs = model(**input_encoded)
    loss = outputs.loss
    accelerator.backward(loss)
    optimizer.step()