GBTL-attack

This is the official repository for "Learning to Poison Large Language Models During Instruction Tuning" by Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Mohammad Amin Roshani, Douglas Zytko and Dongxiao Zhu

Illustration of our poisoning attack

Installation

We use the newest version of PyEnchant, FastChat, and livelossplot. These three packages can be installed by running the following command:

pip3 install livelossplot pyenchant "fschat[model_worker,webui]"

When you install PyEnchant, it typically requires the Enchant library to be installed on your system. you can install it using the following command:

sudo apt-get install libenchant1c2a

Experiment

You can also find our method(GBTL) in demo.ipynb.

Data Poisoning

The script to poison data and fine-tune the model in data poisoning.py. We also crafted a small sentiment dataset from SST-2 for data poisoning you can find it in /dataset-sentiment.

Evaluation

Please find the evaluation code in evaluation.ipynb.

Citation

@article{qiang2024learning,
  title={Learning to Poison Large Language Models During Instruction Tuning},
  author={Qiang, Yao and Zhou, Xiangyu and Zade, Saleh Zare and Roshani, Mohammad Amin and Zytko, Douglas and Zhu, Dongxiao},
  journal={arXiv preprint arXiv:2402.13459},
  year={2024}
}