/baby-rlhf

Simple conceptual implementation of reinforcement learning from human preferences.

Primary LanguagePython

Stargazers