This is the official repository for "Learning to Poison Large Language Models During Instruction Tuning" by Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Mohammad Amin Roshani, Douglas Zytko and Dongxiao Zhu
We use the newest version of PyEnchant, FastChat, and livelossplot. These three packages can be installed by running the following command:
pip3 install livelossplot pyenchant "fschat[model_worker,webui]"
When you install PyEnchant, it typically requires the Enchant library to be installed on your system. you can install it using the following command:
sudo apt-get install libenchant1c2a
You can also find our method(GBTL) in demo.ipynb.
The script to poison data and fine-tune the model in data poisoning.py. We also crafted a small sentiment dataset from SST-2 for data poisoning you can find it in /dataset-sentiment.
Please find the evaluation code in evaluation.ipynb.
@article{qiang2024learning,
title={Learning to Poison Large Language Models During Instruction Tuning},
author={Qiang, Yao and Zhou, Xiangyu and Zade, Saleh Zare and Roshani, Mohammad Amin and Zytko, Douglas and Zhu, Dongxiao},
journal={arXiv preprint arXiv:2402.13459},
year={2024}
}