/lora-experiment-1

Use LoRA technique to improve training Large Language Model

Primary LanguageJupyter Notebook

LoRA-Experiment

Implementation of LoRA: Low-Rank Adaptation of Large Language Models.

This project is a part of TF06 Course from ProtonX. We use LoRA technique to improve training Large Language Model.

We use Bloomz-1b1 to fine tuning on English - Vietnamese datasets.

Give us a star if this repo helpful to you.

Slide about LoRA Explain (by Nguyen Bui Ngoc Han):

I. How to run our pretrained model?

You just download the .ipybn file and run it on Google Colab or on your Jupyter Notebook.

image

Live demo (Click icon below to run in Colab):

II. How to add LoRA to finetuining your own model?

  • Step 1: Load your model.

    For example you have model like this:

    from transformers import AutoModelForCausalLM
    from transformers import AutoTokenizer
    modelName = "bigscience/bloomz-1b1" # Or whatever you want in HuggingFace
    model = AutoModelForCausalLM.from_pretrained(modelName).to(device)
    tokenizer = AutoTokenizer.from_pretrained(modelName)

    The device is your hardware support. You can set it automatically with this code:

    import torch
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
  • Step 2: Prepare dataset for Training.

    For example you want to make a text-generating model for question-anwsering task, you will need a dataset that it have list of questions and anwsers. You can try this dataset for practice:

    Get dataset from source:

    !wget https://raw.githubusercontent.com/phatjkk/data/main/LLM/Ecommerce_FAQ_Chatbot_dataset.json
    

    Load dataset as HuggingFace Dataset type:

    from datasets import load_dataset
    from datasets import Dataset
    data = load_dataset('json', data_files='Ecommerce_FAQ_Chatbot_dataset.json')
    ds = Dataset.from_list(data["train"]["questions"][0])

    Merge question and answer columns into one called prediction:

    def merge_columns(example):
      example["prediction"] = example["question"] + " ->: " + str(example["answer"])
      return example
    # Map merge_columns function to dataset
    ds = ds.map(merge_columns)

    Tokenizer prediction column:

    # Tokenizer/Véc tơ hóa văn bản (Chuyển văn bản thành số để training)
    def tokeni(example):
      example["prediction_token"] = tokenizer(example["prediction"], return_tensors='pt', padding=True)['input_ids']
      return example
    # Map tokeni function to dataset
    ds = ds.map(tokeni,batched=True)
  • Step 3: Add LoraConfig Adapter to model

    # Set config for LoRA 
    from peft import LoraConfig, get_peft_model
    config = LoraConfig(
          r=16, #attention heads
          lora_alpha=16, #alpha scaling
          lora_dropout=0.05,
          bias="none",
          task_type="CAUSAL_LM" # set this for CLM or Seq2Seq
    )
    # Set peft adapter to model
    model_lora = get_peft_model(model, config)

    There are some explain arguments for this code:

    • r: Lora attention dimension (int).
    • lora_alpha: The alpha parameter for Lora scaling.
    • lora_dropout: The dropout probability for Lora layers.
    • bias: Bias type for Lora. Can be 'none', 'all' or 'lora_only'
    • task_type: Task you want to run
  • Step 4: Training model

    # Training model
    import transformers
    from transformers import Trainer,EarlyStoppingCallback
      
    class CustomTrainer(Trainer):
      def compute_loss(self, model, inputs, return_outputs=False):
          outputs = model(**inputs)
          #Perplexity
          perplexity = torch.exp(outputs.loss)
          return (perplexity, outputs) if return_outputs else perplexity
    trainer = CustomTrainer(
      model=model,
      train_dataset=ds_tt["train"]["prediction"],
      eval_dataset=ds_tt["test"]["prediction"],
      args=transformers.TrainingArguments(
          per_device_train_batch_size=3, # batch size
          num_train_epochs=1, # epochs
          gradient_accumulation_steps=1,
          warmup_steps=100,
          save_total_limit=5,
          learning_rate=2e-4,
          fp16=True,
          output_dir='outputs',
          logging_steps=500,
          evaluation_strategy="steps",
          load_best_model_at_end = True
          ),
          data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
          callbacks=[EarlyStoppingCallback(early_stopping_patience = 4)]
    )
    model.config.use_cache = True  # silence the warnings. Please re-enable for inference!
    trainer.train()

    When finish training task you can show the loss curve of train and validation:

    trainingEpoch_loss_adam,validationEpoch_loss_adam=[],[]
    t = 0
    for i in trainer.state.log_history[:-1]:
       if t == 0:
         trainingEpoch_loss_adam.append(i["loss"])
         t=1
       else:
         validationEpoch_loss_adam.append(i["eval_loss"])
         t=0
    from matplotlib import pyplot as plt
    plt.plot(trainingEpoch_loss_adam, label='train_loss')
    plt.plot(validationEpoch_loss_adam,label='val_loss')
    plt.legend()
    plt.show

    Example result:

- Step 5: Test generate task

You can generate text from model like this:

question = "How can I create an account?"
prompt = question+" ->: "
inputs = tokenizer( question, return_tensors="pt")
with torch.autocast(device.type):
  outputs = model.generate(input_ids=inputs["input_ids"].to(device), max_new_tokens=100)
  print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])

Example Result:

How can I create an account? ->:  Click the "Create an account" button. Enter your email address and password. Click the "Continue" button.

III. About datasets

In this project we use datasets from 4 sources:

IV. Result and Comparision

Model result:

- NLLB + viquad Dataset (Vietnamese): (training_loss=2.1773)
- Ecommerce FAQ Chatbot Dataset (English): (training_loss=2.3110)
- Ecommerce FAQ Chatbot Dataset (Vietnamese): (training_loss=2.0299)

Time compare:

  • Model bloomz-1b1 train data NLLB, 1 epoch (Using LoRA) (Train on V100 Colab)

  • Model bloomz-1b1 train data NLLB, 1 epoch (without LoRA) (Train on V100 Colab)

Compare Table:

LoRA Without LoRA
Time Training ~157m ~202m

So with LoRA technique, we reduce the training time 22.2% in NLLB-57k dataset with bloomz-1b1 model.

Authors:

Nguyen Thanh Phat (phatjk)

Nguyen Bui Ngoc Han (Nguyễn Hân)

Nguyen Thanh Chung (Edward Nguyen)

Pham Quynh Trang (Trang Pham)

Advisors:

Nguyen Ba Ngoc