We have edited the code of project DeepSpeed-Chat to support many new features as shown below.
- We propose a hybrid alignment training to improve the LLM (./examples/hybrid_alignment_training).
- Add extra loss for RLHF in step3 like SFT loss and pre-trained loss (./examples/add_extra_loss_for_rlhf).
- Support DPO as step2 (./examples/dpo).
- Implement ESRL features to train efficiently in step3 (./examples/esrl).
- Support COMET model(s) as reward model(s) in step3 RLHF (./examples/rlhf_with_comet_reward).
- Support using scores instead of pairwise data only to train reward models directly (./examples/training_reward_with_scores).
More details in ./examples.
You can use anaconda/miniconda to install packages needed for this project.
conda env create -f conda-env.yml
conda activate dschat
pip install -r requirements.txt
bash scripts/sft.sh
bash scripts/reward.sh
bash examples/dpo/train.sh
bash scripts/rlhf.sh
Model | Model size |
---|---|
Baichuan | 7B/13B |
Baichuan2 | 7B/13B |
LLaMA | 7B/13B/33B/65B |
LLaMA-2 | 7B/13B/70B |
Yi | 6B/34B |
The dataset for SFT should be txt
files including train.txt
and test.txt
with sft
in path such as /your/path/to/sft_dataset/train.txt
, containing a json string each line as example below.
Example:
{"instruction": "User: Your task is to ... \nAssistant: ", "input": "...", "output": "..."}
...
We also support sft training with multi-turn dialogues. The corresponding dataset also contains a json string on each line, as shown in the example below.
Example:
{
"instruction": "User: Your task is to ... \nAssistant: ",
"input": "...",
"output": "...",
"history": [
["user instruction in the first round (optional)", "model response in the first round (optional)"],
["user instruction in the second round (optional)", "model response in the second round (optional)"],
...
]
}
...
The dataset for Reward/DPO should be parquet files including train.parquet
and test.parquet
with reward
in path such as /your/path/to/reward_dataset/train.parquet
, containing four keys each entry as example below.
Example:
prompt | response | chosen | rejected |
---|---|---|---|
User: What are some of the challenges with usi... | Some of the challenges with using machine lear... | Some of the challenges with using machine lear... | Machine learning is a very powerful tool. |
User: Looking for an essay by a contemporary m... | I believe you're thinking of Bernard-Henri Lévy. | I believe you're thinking of Bernard-Henri Lévy. | Laclau maybe? |
... | ... | ... | ... |
Same as SFT, except for rlhf
in path such as /your/path/to/rlhf_dataset/train.txt
.
You can use this python script for inference as shown in ./scripts/predict.sh
in which the input should be in format of {Input} ||| {None/Reference}
while output would be {Input} ||| {ModelOutput} ||| {None/Reference}
as example below.
Example:
input.txt
User: What are the names of some famous actors ...\nAssistant: ||| Some famous ...
User: ... ||| None
... ||| ...
output.txt
User: What are the names of some famous actors ...\nAssistant: ||| 1. Denzel Washington ... ||| Some famous ...
User: ... ||| ... ||| None
... ||| ... ||| ...
Thanks to the DeepSpeed-Chat project and its contributors❤️❤️❤️!