Reinforcement learning from human feedback (RLHF) Movie Reviews Example
Primary LanguageJupyter NotebookApache License 2.0Apache-2.0