tml-epfl/llm-adaptive-attacks

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]

ShellMIT

Issues

Potential BUG
#7 opened a month ago by Junjie-Chu
1
Questions about adversarial suffix generation
#8 opened a month ago by Syyabb
1
get_universal_manual_prompt template
#6 opened 2 months ago by wusuhuang
2
Question about the tokenizer's pad_token when using llama2 as target model
#5 opened 2 months ago by Kris-Lcq
2
Reproducing the experimental results
#4 opened 2 months ago by bxiong1
10
A typo in main.py
#3 opened 5 months ago by franciscoliu
1
Question about the system prompt used for llama-2
#2 opened 5 months ago by rickyang1114
2
How to obtain the adv_init?
#1 opened 5 months ago by xszheng2020
2