/rlhf-trl

Reinforcement Learning from Human Feedback with 🤗 TRL

Primary LanguagePython

Watchers