/llm_defends

code of paper "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM"

Primary LanguagePython

Watchers