This repo contains training scripts used in
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen, Guande He, Lifan, Yuan, Ganqu Cui, Hang Su, and Jun Zhu
Tsinghua
We enable aligning a pretrained language model with datasets annotated by explicit rewards instead of just binary preference by introducing Noise Contrastive Alignment (Figure 1). This framework includes two general algorithms (NCA and InfoNCA) that can deal with both preference data and reward data. Notably, we find that InfoNCA incorporates DPO loss as a special case in binary preference settings. Compared with DPO/InfoNCA, the main advantage of NCA is that it effectively prevents the chosen likelihood from decreasing, a phenomenon commonly observed when applying DPO/InfoNCA loss (Figure 2).
In this repo, we release:
- The training scripts of NCA/InfoNCA for aligning Mistral-7B model using UltraFeedback Dataset.
- Pretrained model weights.
- [2024.06] Dataset and training code are released.
- [2024.05] The pairwise preference version of NCA has now been supported by trl library.
- [2024.04] NCA algorithm helps empower Eurus-70B and Eurus-8*7B model, demonstrating significant advantages in complex reasoning tasks compared to the DPO algorithm. Eurus-70B outperformed GPT-3.5-Turbo in a comprehensive benchmark across 12 tests covering five different tasks.
- [2024.03] Pretrained model weights are released.
cd alignment-handbook; pip install -e .
and
cd trl; pip install -e .
Before running, please determine your available training device numbers and change gradient_accumulation_steps
for an appropriate global batch size. We use 8*A40 GPUs and a global batch size of 32 by default.
For aligning with reward datasets, run
NCCL_P2P_DISABLE=1 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file alignment-handbook/recipes/accelerate_configs/multi_gpu.yaml --num_processes=8 --main_process_port=7000 run_reward.py yamls/reward_qlora.yaml --gradient_accumulation_steps=4 --beta=0.01 --loss_type=[NCA/InfoNCA] --output_dir=data/test_run
For aligning with preference datasets (e.g., Binarized UltraFeedback), run
NCCL_P2P_DISABLE=1 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file alignment-handbook/recipes/accelerate_configs/multi_gpu.yaml --num_processes=8 --main_process_port=7000 run_preference.py yamls/preference_qlora.yaml --gradient_accumulation_steps=4 --beta=0.01 --loss_type=[NCA/DPO] --output_dir=data/test_run
Check out alignment-handbook instructions for evaluating models on MT-bench and AlpacaEval.
MIT
@article{chen2024noise,
title={Noise contrastive alignment of language models with explicit rewards},
author={Chen, Huayu and He, Guande and Yuan, Lifan and Cui, Ganqu and Su, Hang and Zhu, Jun},
journal={arXiv preprint arXiv:2402.05369},
year={2024}
}