/steering-bench

Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"

Primary LanguagePython

No issues in this repository yet.