/weak-to-strong

Weak-to-Strong Jailbreaking on Large Language Models

Primary LanguagePythonMIT LicenseMIT

Stargazers