awesome-RLHF-language-models

Curated list of resources for Reinforcement Learning from Human Feedback and Language Models.

Reinforcement learning from human feedback (RLHF) has gained popularity with ChatGPT with combines language models with RLHF.

Language models

The paper Transformer models: an introduction and catalog contains a very comprehensive of the existing language models.

2020

2022

2023

https://github.com/openai/lm-human-preferences - The first code released to perform RLHF on LMs from OpenAI

https://github.com/allenai/RL4LMs - provide easily customizable building blocks for training language models including implementations of on-policy algorithms, reward functions, metrics, datasets and LM based actor-critic policies
https://github.com/lvwerra/trl - train transformer language models with Proximal Policy Optimization (PPO). The library is built on top of the transformers library by hugs Hugging Face.
https://github.com/lucidrains/PaLM-rlhf-pytorch - Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture.
https://github.com/CarperAI/trlx - A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
https://github.com/voidful/TextRL - Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer

https://huggingface.co/datasets/Anthropic/hh-rlhf - Human preference data about helpfulness and harmlessness
https://huggingface.co/datasets/stanfordnlp/SHP - SHP is a dataset of 385K collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to legal advice.