/interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.

Primary LanguageJupyter Notebook

No issues in this repository yet.