How to evaluate the defense of extract_prompts?

Question

How to evaluate the defense of extract_prompts?

ambarion opened this issue 4 months ago · 0 comments

Hello! I notice the code in scripts/extract_prompt.py does not include any meaningful defense variable? The args.defense is not literaly work in code and only add a name in

if args.defense is not None:
    args.run_name += f"_d-{args.defense}"

How to evaluate as the wandb sweep sweeps/extract_prompts_defense.yml in attacks/PromptLeakage?