/sparse-dictionary-learning

An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"

Primary LanguagePython

Watchers