A collection of resources on Reinforcement Learning from Human Feedback (RLHF), mainly focused on pretrained models.
- Illustrating Reinforcement Learning from Human Feedback (RLHF) :Mainly inspired this repo
- TAMER: Training an Agent Manually via Evaluative Reinforcement
- Interactive Learning from Policy-Dependent Human Feedback
- Deep Reinforcement Learning from Human Preferences [Blog]
- Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
- Fine-Tuning Language Models from Human Preferences [Code (TensorFlow)]
- Learning to summarize with human feedback [Video]
- Recursively Summarizing Books with Human Feedback
- WebGPT: Browser-assisted question-answering with human feedback
- Training language models to follow instructions with human feedback
- Teaching language models to support answers with verified quotes
- Improving alignment of dialogue agents via targeted human judgements
- ChatGPT: Optimizing Language Models for Dialogue
- Scaling Laws for Reward Model Overoptimization
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
- Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
- Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning
- Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization [Code]
- Offline RL for Natural Language Generation with Implicit Language Q Learning [Code]
- Transformer Reinforcement Learning (TRL):Train GPT type transformers model with Proximal Policy Optimization (PPO)
- Transformer Reinforcement Learning X (TRLX):Enhanced TRL with Implicit Language Q-Learning (ILQL)
- RL4LMs (A modular RL library to fine-tune language models to human preferences) [Site]:Thoroughly tested and benchmarked with over 2000 experiments on Language Generation tasks, with different types of metrics, and several RL algorithms. Also support Seq2Seq type Model (eg. T5, BART).
- Learning Task Specifications for Reinforcement Learning from Human Feedback
- Reinforcement Learning from Human Feedback: From Zero to chatGPT
- Add more descriptions
If you have any question, please feel free to contact me (📧: andy.yangzhen@gmail.com).