Raccoon is a test bench for prompt extraction attacks on LLM-Integrated Applications. With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extraction attacks.
We evaluate systems
- Under both defenseless and defended scenarios, employing a dual approach to evaluate the effectiveness of existing defenses and the resilience of the models.
- The benchmark encompasses 14 categories of prompt extraction attacks, with additional compounded attacks that closely mimic the strategies of potential attackers.
- A diverse collection of defense templates. This array is, to our knowledge, the most extensive compilation of prompt theft attacks and defense mechanisms to date.
- [05/24/2024] Publish initial release of Raccoon (GPTs data will be released soon)
- [05/17/2024] Our paper has been accepted by ACL 2024 Findings
$ conda create --name <env> --file requirements.txt
python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/singular_attacks" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_defenseless_user_prompt
python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/singular_attacks_deflesstop5" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_custom_defenses
python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/compound_attacks" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_defenseless_user_prompt
python run_raccoon_gang.py \
--model_name gpt-3.5-0125 \
--gpts_path "./Data/gpts/gpts196" \
--attack_path "./Data/attacks/compound_attacks" \
--ref_def_path "./Data/reference/gpts196_defense_prompt.json" \
--def_tmpl_path "./Data/defenses/defense_template.json" \
--use_sys_template \
--use_custom_defenses
- Loader: an iterator wrapper for loading sampled GPTs
- SysPrompt: a parser class that cleans the collected system prompt and output the prompt in customized formats
- TiktokenWrapper: a Tiktoken tokenizer wrapper used in ROUGE score calculate to support multilingual input.
- Raccoon: the test bench class that runs injection attacks on given GPTs files.