This repository is a framework for beginners to dive into llm training, inference and evaluation. The repository is based on huggingface transformers.
- Supported taskļ¼Classification, Reward Model Training, Supervised Finetuning, Rejection Sampling, Weighted Learning, RRHF, DPO, KTO
- Supported model: bert, llama
- Supported training type: Full parameter training, Parameter efficient training
Model | Supported training method |
---|---|
bert | Classification, Reward Model Training |
llama | Reward Model Training, Supervised Finetuning, Rejection Sampling, RRHF, Weighted Learning |
- Supported task: Reward Model Inference, LLM Inference.
- Supported model: llama
- Supported task: PPL, GPT4 win rate
- Supported model:
- PPL: llama
First, you should use the following command to clone the repository.
git clone https://github.com/underwoodnoble/llm_codebase.git
cd llm_codebase
For most of the tasks, use the following command to install the required packages.
pip install -r requirements/normal.txt
For DPO, use the following command to install the required packages.
pip install -r requirements/dpo.txt
For llama1 reward model, use the following command to install the required packages.
pip install -r requirements/llama_rm.txt
There are two ways to train a classification model in the repository. The first one is to train a model with a classification head. The second one is to train a model with a classification dict head. The classification dict head is a head that can be used to train a model with multiple classification tasks.
Supported Models: Bert, Llama
Under the hood, the classification head is a linear layer on top of the model output. This repository uses transformers AutoModelForSequenceClassification to train a classification model. The following code shows how to train a classification model with a classification head.
Specific Arguments
- cls_data_text_name: field name of the text in data.
- cls_data_label_name: field name of the label in data.
- cls_data_label_nums: number of classes.
Data Format
{
"text": "text",
"label": "label"
}
- You need to align parameter <cls_data_text_name> with the field name of the text in data and <cls_data_label_name> with the field name of the label in data.
bash scripts/bert_classification.sh
You can use accelerate or deepspeed to speed up the training process.
REPO_DIR=repo_dir
DATA_DIR=data_dir
Supported Models: Bert, Llama Specific Arguments
- preference_data_text_name: field name of the text in data.
- preference_data_score_name: field name of the score in data.
- add_lm_loss: whether to add language model loss to the reward model. (Do not support bert)
- lm_loss_coeff: the coefficient of the language model loss. (Do not support bert)
- pooling_type: the pooling type of the model output. The pooling type can be "last", "max", "eos", or "average". (Do not support bert)
- rm_calibration: whether to calibrate the reward model using ece.
- calibration_bins: ece bins
Data Format
{
"text": ["text1", "text2"],
"score": ["score1", "score2"]
}
- You need to align parameter <preference_data_text_name> with the field name of the text in data and <preference_data_score_name> with the field name of the score in data.
- You can specify multiple datasets in <data_paths> and <eval_data_paths>. If different dataset have different number of texts(scores), the data will be padded to the maximum length of the texts(scores) in the dataset(padding text with " " and padding score with -100).
bash scripts/bert_reward.sh
For Llama1, we recommend to use the requirements/llama1.txt to install the required packages. Because we find that higher version of transformers may degrade the preformance of Llama1. And do not use bf16 to train llama reward model, this will degrade the performance too.
# Install the required packages
pip install -r requirements/llama1.txt
# Train the reward model
bash scripts/llama1_reward.sh