ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Fengqing Jiang^1,* , Zhangchen Xu^1,* , Luyao Niu^1,* ,
Zhen Xiang² , Bhaskar Ramasubramanian³ ,
Bo Li⁴ , Radha Poovendran¹

¹University of Washington   ²University of Illinois Urbana-Champaign
³Western Washington University   ⁴University of Chicago
^*Equal Contribution

Warning: This project contains model outputs that may be considered offensive

[arXiv]

Overview

How to Use ArtPrompt

Quick Start

We provide a demo prompt to show the effectiveness of ArtPrompt in notebook demo.ipynb (also at demo_prompt.txt). This is a successful prompt toward gpt-4-0613.

Run with ArtPrompt

Setup Environment

Make sure setup your API key in utils/model.py (or in environment) before running experiment.

Running

Run evaluation on vitc-s dataset. More details please refer to benchmark.py

# at dir ArtPrompt
python benchmark.py --model gpt-4-0613 --task s

Run jailbreak with ArtPrompt. More details please refer to baseline.py

cd jailbreak
python baseline.py --model gpt-4-0613 --tmodel gpt-3.5-turbo-0613

You could use --mp arg to accelerate the inference time based on the available cpu cores on your machine.

Acknowledgement

Our project built upon the work from python-art,llm-attack, AutoDan, PAIR, DeepInception, LLM-Finetuning-Safety, BPE-Dropout. We appreciated these open-sourced work in the community.

Citation

If you find our project useful in your research, please consider citing:

@misc{jiang2024artprompt,
      title={ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs}, 
      author={Fengqing Jiang and Zhangchen Xu and Luyao Niu and Zhen Xiang and Bhaskar Ramasubramanian and Bo Li and Radha Poovendran},
      year={2024},
      eprint={2402.11753},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

giterinhub/ArtPrompt