microsoft/promptbench

Question-answer template attack

ary4n99 opened this issue · 2 comments

Hey, during the attacks conducted for the paper, was the question answer part of the prompt aka \nQuestion: {content}\nAnswer: attacked? Thank you!

Hi, thanks for your interest in prompt attacks! Regarding your question, the behavior you observed depends on the unmodifiable words you've set. Please check this example notebook on how to specify unmodifiable words. You could simply add ['Question', 'Answer'] into unmodifiable words.

In the output, you will see entries similar to the following:

--------------------------------------------------
Modifiable words:  ['As', 'a', 'sentiment', 'classifier', 'determine', 'whether', 'the', 'following', 'text', 'is', 'or', 'Please', 'classify', 'Question', 'Answer']
--------------------------------------------------

This will help you ensure that sections like \nQuestion: {content}\nAnswer: will not be attacked during the prompt attack process.

Thanks!