/DHS2023_RLHF

Bare bones implementation of RLHF to fine tune a language model - to demonstrate the key concepts

Primary LanguageJupyter NotebookMIT LicenseMIT

Watchers