/BackdoorUnalign

Code of NAACL 2024 paper "Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections".

Primary LanguagePython

Stargazers