microsoft/promptbench

Llama2 adversarial prompts

ary4n99 opened this issue · 3 comments

The prompts for Llama 2 have not been provided in prompts/adv_prompts, so running load_adv_prompt doesn't work when using Llama 2. Could these be added, please? Thanks!

Hi, thank you for your interest in prompt attacks! We cannot provide Llama2 adversarial prompts as we have only conducted adversarial attacks on Llama1 models. However, you could try using those Llama1 adversarial prompts with Llama2 models, as our paper demonstrated their transferability.

Hi, thanks for the reply! Where can the Llama1 adversarial prompts be found? Also, why were adversarial attacks not run on Llama2?

Hi, apologize for the confusion in my previous messages. We actually conducted the adversarial attacks on Llama2, not Llama1. Could you please send an email to kaijiezhu11@gmail.com? This way, I can share the Llama2 adversarial prompts with you directly.