Fengqing Jiang1,* ,
Zhangchen Xu1,* ,
Luyao Niu1,* ,
Zhen Xiang2 ,
Bhaskar Ramasubramanian3 ,
Bo Li4 ,
Radha Poovendran1
1University of Washington 2University of Illinois Urbana-Champaign
3Western Washington University 4University of Chicago
*Equal Contribution
Warning: This project contains model outputs that may be considered offensive
We provide a demo prompt to show the effectiveness of ArtPrompt in notebook demo.ipynb
(also at demo_prompt.txt
). This is a successful prompt toward gpt-4-0613
.
- Make sure setup your API key in
utils/model.py
(or in environment) before running experiment.
Run evaluation on vitc-s
dataset. More details please refer to benchmark.py
# at dir ArtPrompt
python benchmark.py --model gpt-4-0613 --task s
Run jailbreak with ArtPrompt. More details please refer to baseline.py
cd jailbreak
python baseline.py --model gpt-4-0613 --tmodel gpt-3.5-turbo-0613
You could use --mp
arg to accelerate the inference time based on the available cpu cores on your machine.
Our project built upon the work from python-art,llm-attack, AutoDan, PAIR, DeepInception, LLM-Finetuning-Safety, BPE-Dropout. We appreciated these open-sourced work in the community.
If you find our project useful in your research, please consider citing:
@misc{jiang2024artprompt,
title={ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs},
author={Fengqing Jiang and Zhangchen Xu and Luyao Niu and Zhen Xiang and Bhaskar Ramasubramanian and Bo Li and Radha Poovendran},
year={2024},
eprint={2402.11753},
archivePrefix={arXiv},
primaryClass={cs.CL}
}