annahdo/exploring_directions
We find concept directions in hidden layers of an LLM an use them for classification, activation steering and knowledge removal
Jupyter NotebookApache-2.0
We find concept directions in hidden layers of an LLM an use them for classification, activation steering and knowledge removal
Jupyter NotebookApache-2.0