/repr-preference-optimization

align inner states not actions for better generalization? [wip]

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Stargazers