Implementation of Reinforcement Learning from Human Feedback (RLHF)
Primary LanguageJupyter NotebookMIT LicenseMIT