/JailbreakDefense_GoalPriority

[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Primary LanguagePython

Stargazers