/ravel

Evaluate interpretability methods on localizing and disentangling concepts in LLMs.

Primary LanguagePythonMIT LicenseMIT

Stargazers