[EMNLP 2024] Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
🌐 Homepage | 📖 Paper | 🤗 Dataset (Data Advisor) | 🤗 Dataset (Self-Instruct)
Generate safety alignment data with Data Advisor:
python data_advisor.py
python response_generation.py
Generate safety alignment data with Self-Instruct:
python self_instruct.py
python response_generation.py
First, prepare Alpagasus data:
python utils/export_alpagasus.py
Then, train the target model with Alpagasus data and safety alignment data generated by Data Advisor:
python train_target_model.py
Evaluate model safety with LlamaGuard on CatQA and BeaverTails:
bash scripts/eval_catqa.sh
bash scripts/eval_beavertails.sh
Evaluate model utility on MMLU:
bash scripts/eval_mmlu.sh
@inproceedings{wang2024data,
title={Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models},
author={Wang, Fei and Mehrabi, Ninareh and Goyal, Palash and Gupta, Rahul and Chang, Kai-Wei and Galstyan, Aram},
booktitle={Proceedings of EMNLP 2024},
year={2024}
}