Status: Archive (code is provided as-is, no updates expected)
This repository contains code for the paper Fine-Tuning Language Models from Human Preferences. See also our blog post.
We provide code for:
- Training reward models from human labels
- Fine-tuning language models using those reward models
It does not contain code for generating labels. However, we have released human labels collected for our experiments, at gs://lm-human-preferences/labels
.
For those interested, the question and label schemas are simple and documented in label_types.py
.
The code has only been tested using the smallest GPT-2 model (124M parameters).
This code has only been tested using Python 3.7.3. Training has been tested on GCE machines with 8 V100s, running Ubuntu 16.04, but development also works on Mac OS X.
-
Install pipenv.
-
Install tensorflow: Install CUDA 10.0 and cuDNN 7.6.2, then
pipenv install tensorflow-gpu==1.13.1
. The code may technically run with tensorflow on CPU but will be very slow. -
Install
gsutil
-
Clone this repo. Then:
pipenv install
-
(Recommended) Install
horovod
to speed up the code, or otherwise substitute some fast implementation in thempi_allreduce_sum
function ofcore.py
. Make sure to use pipenv for the install, e.g.pipenv install horovod==0.18.1
.
The following examples assume we are aiming to train a model to continue text in a physically descriptive way.
You can read launch.py
to see how the descriptiveness
experiments and others are defined.
Note that we provide pre-trained models, so you can skip directly to RL fine-tuning or even to sampling from a trained policy, if desired.
To train a reward model, use a command such as
experiment=descriptiveness
reward_experiment_name=testdesc-$(date +%y%m%d%H%M)
pipenv run ./launch.py train_reward $experiment $reward_experiment_name
This will save outputs (and tensorboard event files) to the directory /tmp/save/train_reward/$reward_experiment_name
. The directory can be changed via the --save_dir
flag.
Once you have trained a reward model, you can finetune against it.
First, set
trained_reward_model=/tmp/save/train_reward/$reward_experiment_name
or if using our pretrained model,
trained_reward_model=gs://lm-human-preferences/runs/descriptiveness/reward_model
Then,
experiment=descriptiveness
policy_experiment_name=testdesc-$(date +%y%m%d%H%M)
pipenv run ./launch.py train_policy $experiment $policy_experiment_name --rewards.trained_model $trained_reward_model --rewards.train_new_model 'off'
This will save outputs (and tensorboard event files) to the directory /tmp/save/train_policy/$policy_experiment_name
. The directory can be changed via the --save_dir
flag.
You can run a single command to train a reward model and then finetune against it
experiment=descriptiveness
experiment_name=testdesc-$(date +%y%m%d%H%M)
pipenv run ./launch.py train_policy $experiment $experiment_name
In this case, outputs are in the directory /tmp/save/train_policy/$policy_experiment_name
, and the reward model is saved to a subdirectory reward_model
. The directory can be changed via the --save_dir
flag.
Specify the policy to load:
save_dir=/tmp/save/train_policy/$policy_experiment_name
or if using our pretrained model,
save_dir=gs://lm-human-preferences/runs/descriptiveness
Then run:
pipenv run ./sample.py sample --save_dir $save_dir --savescope policy
Note that this script can run on less than 8 GPUs. You can pass the flag --mpi 1
, for exapmle, if you only have one GPU.
Please cite the paper with the following bibtex entry:
@article{ziegler2019finetuning,
title={Fine-Tuning Language Models from Human Preferences},
author={Ziegler, Daniel M. and Stiennon, Nisan and Wu, Jeffrey and Brown, Tom B. and Radford, Alec and Amodei, Dario and Christiano, Paul and Irving, Geoffrey},
journal={arXiv preprint arXiv:1909.08593},
url={https://arxiv.org/abs/1909.08593},
year={2019}
}