/exploring_directions

We find concept directions in hidden layers of an LLM an use them for classification, activation steering and knowledge removal

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Stargazers