Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
Primary LanguagePythonApache License 2.0Apache-2.0