/RLHF_example

Reinforcement learning from human feedback (RLHF) Movie Reviews Example

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Watchers