/vigogne

Fine-tune French instruction-following models

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Vigogne

Vigogne 🦙: French Instruction-following Models

Code License Data License

The vigogne (French name for vicuña) is a South American camelid native to the Andes Mountains. It is closely related to the llama, alpaca, and guanaco.

This repository contains code for reproducing the Stanford Alpaca in French 🇫🇷 using low-rank adaptation (LoRA) provided by 🤗 Hugging Face's PEFT library. In addition to the LoRA technique, we also use LLM.int8() provided by bitsandbytes to quantize pretrained language models (PLMs) to int8. Combining these two techniques allows us to fine-tune PLMs on a single consumer GPU such as RTX 4090.

This project is based on LLaMA, Stanford Alpaca, Alpaca-Lora, Cabrita and Hugging Face. In addition, we adapted the training script to fine-tune on more models such as BLOOM and mT5. We also share the translated dataset and the trained vigogne-lora-7b and vigogne-lora-bloom-7b1 weights.

Usage and License Notices: Same as Stanford Alpaca, Vigogne is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

💡 The screencast below shows the current 🦙 Vigogne-LoRA-7B model running on Apple M1 Pro using 4GB of weights (no sped up).

Table of Contents

Setup

Install dependencies

pip install -r requirements.txt

Play with 🦙 Vigogne models

User Notice: Facebook has not made the official LLaMA model weights open source, although various third-party download links are available online, such as decapoda-research/llama-7b-hf in the HuggingFace model library. It should be noted that the use of these links may not comply with Facebook's policies. Due to the reasons mentioned above, the project cannot release the complete weights of fine-tuned models. However, only the LoRA weights can be provided, which can be considered as a "patch" for the original LLaMA model.

The fine-tuned instruction-following vigogne models are available on 🤗 Hugging Face:

You can infer these models by using the following Google Colab Notebook.

Open In Colab

You can also run a Gradio demo using the following command:

./demo.py \
    --base_model_name_or_path <name/or/path/to/hf/llama/7b/model> \
    --lora_model_name_or_path bofenghuang/vigogne-lora-7b

Try it out on your own PC

The Vigogne models can now be easily deployed on PCs, thanks to the excellent tools created by the community. The following steps provide detailed instructions on how to combine Vigogne-LoRA weights with the original LLaMA model, quantize the resulting model to 4-bit, and finally deploy it on your own PC using llama.cpp.

Note: the models will be quantized into 4-bit, so the performance might be worse than the non-quantized version. The responses are random due to the generation hyperparameters.

Please ensure that the following requirements are met prior to running:

  • As the models are currently fully loaded into memory, you will need adequate disk space to save them and sufficient RAM to load them. You will need at least 13GB of RAM to quantize the 7B model. For more information, refer to this link.
  • It's best to use Python 3.9 or Python 3.10, as sentencepiece has not yet published a wheel for Python 3.11.

1. Clone and build llama.cpp repo

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

2. Combine Vigogne-LoRA weights with the corresponding original LLaMA model

# combine
python ../scripts/export_state_dict_checkpoint.py \
    --base_model_name_or_path <name/or/path/to/hf/llama/7b/model> \
    --lora_model_name_or_path "bofenghuang/vigogne-lora-7b" \
    --output_dir ./models/7B

# download the tokenizer.model file
wget -P ./models https://huggingface.co/bofenghuang/vigogne-lora-7b/resolve/main/tokenizer.model

# check the files
tree models
# models
# ├── 7B
# │   ├── consolidated.00.pth
# │   └── params.json
# └── tokenizer.model

3. Quantize the combined model

# convert the 7B model to ggml FP16 format
python convert-pth-to-ggml.py ./models/7B/ 1

# further quantize the model to 4-bit
python quantize.py 7B

4. Run the inference

# ./main -h for more information
./main -m ./models/7B/ggml-model-q4_0.bin --color -ins -c 2048 --temp 0.1 -n 256

Data

We translated the original alpaca_data.json to French using gpt-3.5-turbo by the chat completion API.

You can also translate it to other languages using the translation script. Don't forget to modify your translation prompt.

The translation may have compromised the accuracy of certain tasks, such as generating rhyming words or correcting grammar (discussed here). We warmly welcome PRs to help clean up this dataset!

The following command shows how to estimate the price for translating the full dataset.

./scripts/translate_data.py estimate_price \
    --input_json_file data/alpaca_data_cleaned.json \
    --ratio_output_input 1.0 \
    --model gpt-3.5-turbo-0301 \
    --price_per_thousand_tokens 0.002

You can translate the dataset using the following command.

# Specify your OpenAI API key
export OPENAI_API_KEY=xx

./scripts/translate_data.py process_data \
    --input_json_file data/alpaca_data_cleaned.json \
    --output_json_file data/vigogne_data_cleaned.json \
    --model gpt-3.5-turbo \
    --max_parallel_requests 32

Training

Fine-tuning LLaMA-7B model

The following command shows how to fine-tune LLaMA-7B model using a single GPU.

python finetune.py \
    --model_name_or_path <name/or/path/to/hf/llama/7b/model> \
    --train_file "data/vigogne_data_cleaned.json" \
    --output_dir "outputs/llama-7b-ft-vigogne-lora" \
    --run_name "llama-7b-ft-vigogne-lora" \
    --overwrite_output_dir \
    --model_max_length_percentile 95 \
    --preprocessing_num_workers 4 \
    --dataloader_num_workers 1 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --target_modules "q_proj" "v_proj" \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 8 \
    --num_train_epochs 3 \
    --learning_rate 3e-4 \
    --warmup_steps 100 \
    --logging_steps 25 \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --report_to "tensorboard" "wandb"

Fine-tuning LLaMA-30B model

The following command shows how to fine-tune LLaMA-30B model using multi GPUs.

WORLD_SIZE=2 torchrun --nproc_per_node=2 --master_port=29001 finetune.py \
    --model_name_or_path <name/or/path/to/hf/llama/30b/model> \
    --train_file "data/vigogne_data_cleaned.json" \
    --output_dir "outputs/llama-30b-ft-vigogne-lora" \
    --run_name "llama-30b-ft-vigogne-lora" \
    --overwrite_output_dir \
    --model_max_length_percentile 95 \
    --preprocessing_num_workers 4 \
    --dataloader_num_workers 1 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --target_modules "q_proj" "v_proj" \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --num_train_epochs 3 \
    --learning_rate 3e-4 \
    --warmup_steps 100 \
    --logging_steps 25 \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --report_to "tensorboard" "wandb"

Fine-tuning BLOOM-7B1 model

The following command shows how to fine-tune bigscience/bloom-7b1 model using a single GPU.

python finetune.py \
    --model_name_or_path "bigscience/bloom-7b1" \
    --train_file "data/vigogne_data_cleaned.json" \
    --output_dir "outputs/bloom-7b1-ft-vigogne" \
    --run_name "bloom-7b1-ft-vigogne" \
    --overwrite_output_dir \
    --model_max_length_percentile 95 \
    --preprocessing_num_workers 4 \
    --dataloader_num_workers 1 \
    --lora_r 16 \
    --lora_alpha 32 \
    --lora_dropout 0.05 \
    --target_modules "query_key_value" \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 8 \
    --num_train_epochs 3 \
    --learning_rate 3e-4 \
    --warmup_steps 100 \
    --logging_steps 25 \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --report_to "tensorboard" "wandb"

Fine-tuning OPT-6.7B model

The following command shows how to fine-tune facebook/opt-6.7b model using a single GPU.

python finetune.py \
    --model_name_or_path "facebook/opt-6.7b" \
    --train_file "data/vigogne_data_cleaned.json" \
    --output_dir "outputs/opt-6b7-ft-vigogne" \
    --run_name "opt-6b7-ft-vigogne" \
    --overwrite_output_dir \
    --model_max_length_percentile 95 \
    --preprocessing_num_workers 4 \
    --dataloader_num_workers 1 \
    --lora_r 16 \
    --lora_alpha 32 \
    --lora_dropout 0.05 \
    --target_modules "q_proj" "v_proj" \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 8 \
    --num_train_epochs 3 \
    --learning_rate 3e-4 \
    --warmup_steps 100 \
    --logging_steps 25 \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --report_to "tensorboard" "wandb"

Example outputs

RĂ©pondez Ă  la question suivante : Les pratiques artistiques transforment-elles le monde ?

  • Vigogne-LoRA-7B: Les pratiques d'art peuvent transformer le monde en crĂ©ant des Ĺ“uvres qui Ă©voquent des sentiments, des idĂ©es et des Ă©motions. Elles peuvent Ă©galement ĂŞtre utilisĂ©es pour crĂ©er des Ĺ’uvres d'Art Vivant qui sont des Ĺ“uvre vivantes qui peuvent ĂŞtre utilisĂ©s pour crĂ©Ă© des expĂ©riences et des conversations. Les praticiens d'arts peuvent Ă©galement utiliser leurs pratiques pour crĂ©erer des Ĺ“ufs d'âme, des ĂŞtres vivants crĂ©Ă©s Ă  partir de matĂ©riaux naturels, qui peuvent avoir un impact sur la façon dont les gens pensent et ressentent le monde.
Other models
  • Alpaca-LoRA: Yes, artistic practices can transform the world. Art can be used to express ideas, explore new perspectives, and create dialogue. Art can also be used to inspire and motivate people to take action, and to create positive change in their communities.

  • text-davinci-003: Oui, les pratiques artistiques peuvent transformer le monde. Les artistes peuvent utiliser leur art pour exprimer leurs opinions et leurs idĂ©es, et ces idĂ©es peuvent avoir un impact sur la sociĂ©tĂ© et le monde. Les artistes peuvent Ă©galement utiliser leur art pour sensibiliser le public Ă  des sujets importants et pour encourager le changement. Les pratiques artistiques peuvent donc ĂŞtre un puissant outil pour transformer le monde.


Expliquez la différence entre DoS et phishing en français.

  • Vigogne-LoRA-7B: Le DoS (Denial of Service) est une attaque informatique qui consiste Ă  bloquer ou Ă  interrompre l'accès Ă  un service ou Ă  un système informatique. Le phishing est une technique de fraude qui consisterait Ă  obtenir des informations sensibles d'un utilisateur en lui demandant de fournir des renseignements, tels que des mots de passe ou des numĂ©ros de carte de crĂ©dit.
Other models
  • Alpaca-LoRA: In French, the difference between a DDoS attack and phishing is that a DDoS attack is an attempt to overwhelm a system with a large amount of traffic, while phishing is an attempt to steal personal information by posing as a trusted source.

  • text-davinci-003:

DoS (Denial of Service) est une attaque qui vise à rendre indisponible des services web ou réseau en saturant un serveur ou un réseau avec des requêtes. Elle peut aussi être utilisée pour perturber un réseau en envoyant des paquets mal formés ou en le surchargant.

Phishing est une forme d'escroquerie en ligne visant à voler des informations personnelles telles que des mots de passe et des numéros de carte de crédit en incitant les victimes à fournir leurs informations sur des faux sites Web ou des emails frauduleux.


RĂ©sumez le texte suivant en une seule phrase.

La science des données est l'étude des données afin d'en extraire des informations significatives pour les entreprises. Il s'agit d'une approche pluridisciplinaire qui combine des principes et des pratiques issus des domaines des mathématiques, des statistiques, de l'intelligence artificielle et du génie informatique, en vue d'analyser de grands volumes de données. Cette analyse aide les scientifiques des données à poser des questions et à y répondre, comme Que s'est-il passé, Pourquoi cela s'est-il passé, Que va-t-il se passer et Que peut-on faire avec des résultats.

  • Vigogne-LoRA-7B: Le scientifique des donnĂ©es utilise des donnĂ©es pour analyser et extraire de l’information pour aider les entrepreneurs Ă  posent des questions, rĂ©pondre Ă  des questions existantes et prendre des dĂ©cisions.
Other models
  • Alpaca-LoRA: La science des donnĂ©es est l'Ă©tude des donnĂ©es afin d'en extraire des informations significatives pour les entreprises.

  • text-davinci-003: La science des donnĂ©es combine des principes et des pratiques issus des domaines des mathĂ©matiques, des statistiques, de l'intelligence artificielle et du gĂ©nie informatique pour analyser de grands volumes de donnĂ©es afin d'en extraire des informations significatives pour les entreprises.


Écrivez une fonction qui prend une liste de chaînes de caractères et renvoie une liste sans doublons.

  • Vigogne-LoRA-7B:
def remove_duplicates(lst):
    result = []
    for item in lst:
        if item not in result:
            result.append(item)
    return result
Other models
  • Alpaca-LoRA:
def remove_duplicates(list):
    seen = set()
    return [x for x in list if x not in seen]
  • text-davinci-003:
def remove_duplicates(liste):
  new_list = []
  for item in liste:
    if item not in new_list:
      new_list.append(item)
  return new_list

liste = ["a", "b", "c", "a", "d", "e", "f", "c"]
print(remove_duplicates(liste)) # ['a', 'b', 'c', 'd', 'e', 'f']

Bias, Risks, and Limitations

Vigogne is still under development, and there are many limitations that have to be addressed. Please note that it is possible that the model generates harmful or biased content, incorrect information or generally unhelpful answers.

Next steps

  • Collect more and cleaner French instruction-following data