/nanoGPT-RL

RLHF with PPO from scratch, on GPT-2, using nanoGPT.

Primary LanguagePythonMIT LicenseMIT

Watchers