/circrl

Tools for applying circuits-style interpretability techniques to RL agents.

Primary LanguagePythonMIT LicenseMIT

CircRL

Tools for applying circuits-style interpretability techniques to RL agents.