/awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Apache License 2.0Apache-2.0

Awesome RLHF (RL with Human Feedback)

Awesome visitors GitHub stars GitHub forks GitHub license

This is a collection of research papers for Reinforcement Learning with Human Feedback (RLHF). And the repository will be continuously updated to track the frontier of RLHF.

Welcome to follow and star!

Table of Contents

Overview of RLHF

The idea of RLHF is to use methods from reinforcement learning to directly optimize a language model with human feedback. RLHF has enabled language models to begin to align a model trained on a general corpus of text data to that of complex human values.

  • RLHF for Large Language Model (LLM)

image info

  • RLHF for Video Game (e.g. Atari)

image info

Detailed Explanation

(The following section was automatically generated by ChatGPT)

RLHF typically refers to "Reinforcement Learning with Human Feedback". Reinforcement Learning (RL) is a type of machine learning that involves training an agent to make decisions based on feedback from its environment. In RLHF, the agent also receives feedback from humans in the form of ratings or evaluations of its actions, which can help it learn more quickly and accurately.

RLHF is an active research area in artificial intelligence, with applications in fields such as robotics, gaming, and personalized recommendation systems. It seeks to address the challenges of RL in scenarios where the agent has limited access to feedback from the environment and requires human input to improve its performance.

Reinforcement Learning with Human Feedback (RLHF) is a rapidly developing area of research in artificial intelligence, and there are several advanced techniques that have been developed to improve the performance of RLHF systems. Here are some examples:

  • Inverse Reinforcement Learning (IRL): IRL is a technique that allows the agent to learn a reward function from human feedback, rather than relying on pre-defined reward functions. This makes it possible for the agent to learn from more complex feedback signals, such as demonstrations of desired behavior.

  • Apprenticeship Learning: Apprenticeship learning is a technique that combines IRL with supervised learning to enable the agent to learn from both human feedback and expert demonstrations. This can help the agent learn more quickly and effectively, as it is able to learn from both positive and negative feedback.

  • Interactive Machine Learning (IML): IML is a technique that involves active interaction between the agent and the human expert, allowing the expert to provide feedback on the agent's actions in real-time. This can help the agent learn more quickly and efficiently, as it can receive feedback on its actions at each step of the learning process.

  • Human-in-the-Loop Reinforcement Learning (HITLRL): HITLRL is a technique that involves integrating human feedback into the RL process at multiple levels, such as reward shaping, action selection, and policy optimization. This can help to improve the efficiency and effectiveness of the RLHF system by taking advantage of the strengths of both humans and machines.

Here are some examples of Reinforcement Learning with Human Feedback (RLHF):

  • Game Playing: In game playing, human feedback can help the agent learn strategies and tactics that are effective in different game scenarios. For example, in the popular game of Go, human experts can provide feedback to the agent on its moves, helping it improve its gameplay and decision-making.

  • Personalized Recommendation Systems: In recommendation systems, human feedback can help the agent learn the preferences of individual users, making it possible to provide personalized recommendations. For example, the agent could use feedback from users on recommended products to learn which features are most important to them.

  • Robotics: In robotics, human feedback can help the agent learn how to interact with the physical environment in a safe and efficient manner. For example, a robot could learn to navigate a new environment more quickly with feedback from a human operator on the best path to take or which objects to avoid.

  • Education: In education, human feedback can help the agent learn how to teach students more effectively. For example, an AI-based tutor could use feedback from teachers on which teaching strategies work best with different students, helping to personalize the learning experience.

Papers

format:
- [title](paper link) [links]
  - author1, author2, and author3...
  - publisher
  - keyword
  - code
  - experiment environments and datasets

2023

2022

2021

2020 and before

Codebases

format:
- [title](codebase link) [links]
  - author1, author2, and author3...
  - keyword
  - experiment environments, datasets or tasks
  • LLaMA2 + RLHF - DeepSpeed + Ray
    • OpenLLMAI
    • Keyword: LLaMA2, RLHF, DeepSpeed, Ray
    • Task: Open-source implementation of Industrial-grade High-performance LLaMA2 RLHF including PPO/RS, etc.
  • PaLM + RLHF - Pytorch
    • Phil Wang, Yachine Zahidi, Ikko Eltociear Ashimine, Eric Alcaide
    • Keyword: Transformers, PaLM architecture
    • Dataset: enwik8
  • lm-human-preferences
    • Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, Geoffrey Irving
    • Keyword: Reward learning for language, Continuing text with positive sentiment, Summary task, Physical descriptive
    • Dataset: TL;DR, CNN/DM
  • following-instructions-human-feedback
    • Long Ouyang, Jeff Wu, Xu Jiang, et al.
    • Keyword: Large Language Model, Align Language Model with Human Intent
    • Dataset: TruthfulQA RealToxicityPrompts
  • Transformer Reinforcement Learning (TRL)
    • Leandro von Werra, Younes Belkada, Lewis Tunstall, et al.
    • Keyword: Train LLM with RL, PPO, Transformer
    • Task: IMDB sentiment
  • Transformer Reinforcement Learning X (TRLX)
    • Jonathan Tow, Leandro von Werra, et al.
    • Keyword: Distributed training framework, T5-based language models, Train LLM with RL, PPO, ILQL
    • Task: Fine tuning LLM with RL using provided reward function or reward-labeled dataset
  • RL4LMs (A modular RL library to fine-tune language models to human preferences)
  • LaMDA-rlhf-pytorch
    • Phil Wang
    • Keyword: LaMDA, Attention-mechanism
    • Task: Open-source pre-training implementation of Google's LaMDA research paper in PyTorch
  • TextRL
    • Eric Lam
    • Keyword: huggingface's transformer
    • Task: Text generation
    • Env: PFRL, gym
  • minRLHF
    • Thomfoster
    • Keyword: PPO, Minimal library
    • Task: educational purposes
  • DeepSpeed-Chat
    • Microsoft
    • Keyword: Affordable RLHF Training
  • Dromedary
    • IBM
    • Keyword: Minimal human supervision, Self-aligned
    • Task: Self-aligned language model trained with minimal human supervision
  • FG-RLHF
    • Zeqiu Wu, Yushi Hu, Weijia Shi, et al.
    • Keyword: Fine-Grained RLHF, providing a reward after every segment, Incorporating multiple RMs associated with different feedback types
    • Task: A framework that enables training and learning from reward functions that are fine-grained in density and multiple RMs -Safe-RLHF
    • Xuehai Pan, Ruiyang Sun, Jiaming Ji, et al.
    • Keyword: Support popular pre-trained models, Large human-labeled dataset, Multi-scale metrics for safety constraints verification, Customized parameters
    • Task: Constrained Value-Aligned LLM via Safe RLHF

Dataset

format:
- [title](dataset link) [links]
  - author1, author2, and author3...
  - keyword
  - experiment environments or tasks
  • HH-RLHF
    • Ben Mann, Deep Ganguli
    • Keyword: Human preference dataset, Red teaming data, machine-written
    • Task: Open-source dataset for human preference data about helpfulness and harmlessness
  • Stanford Human Preferences Dataset(SHP)
    • Ethayarajh, Kawin and Zhang, Heidi and Wang, Yizhong and Jurafsky, Dan
    • Keyword: Naturally occurring and human-written dataset,18 different subject areas
    • Task: Intended to be used for training RLHF reward models
  • PromptSource
    • Stephen H. Bach, Victor Sanh, Zheng-Xin Yong et al.
    • Keyword: Prompted English datasets, Mapping a data example into natural language
    • Task: Toolkit for creating, Sharing and using natural language prompts
  • Structured Knowledge Grounding(SKG) Resources Collections
    • Tianbao Xie, Chen Henry Wu, Peng Shi et al.
    • Keyword: Structured Knowledge Grounding
    • Task: Collection of datasets are related to structured knowledge grounding
  • The Flan Collection
    • Longpre Shayne, Hou Le, Vu Tu et al.
    • Task: Collection compiles datasets from Flan 2021, P3, Super-Natural Instructions
  • rlhf-reward-datasets
    • Yiting Xie
    • Keyword: Machine-written dataset
  • webgpt_comparisons
    • OpenAI
    • Keyword: Human-written dataset, Long form question answering
    • Task: Train a long form question answering model to align with human preferences
  • summarize_from_feedback
    • OpenAI
    • Keyword: Human-written dataset, summarization
    • Task: Train a summarization model to align with human preferences
  • Dahoas/synthetic-instruct-gptj-pairwise
    • Dahoas
    • Keyword: Human-written dataset, synthetic dataset
  • Stable Alignment - Alignment Learning in Social Games
    • Ruibo Liu, Ruixin (Ray) Yang, Qiang Peng
    • Keyword: Interaction data used for alignment training, Run in Sandbox
    • Task: Train on the recorded interaction data in simulated social games
  • LIMA
    • Meta AI
    • Keyword: without any RLHF, few carefully curated prompts and responses
    • Task: Dataset used for training the LIMA model

Blogs

Other Language Support

Turkish

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome RLHF is released under the Apache 2.0 license.