Code for Bachelor thesis, The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback.
Abstract:
Reinforcement Learning from Human Feedback (RLHF) is a promising approach to training agents to perform complex tasks by incorporating human feedback. However, the quality and diversity of this feedback can significantly impact the learning process. Humans are highly diverse in their preferences, expertise, and capabilities. This paper investigates the effects of conflicting feedback on the agent’s performance. We analyse the impact of environmental complexity and examine various query selection strate- gies. Our results show that RLHF performance rapidly degrades with even minimal conflicting feedback in simple environments, and current query selection strategies are ineffective in handling feedback diversity. We thus conclude that addressing diversity is crucial for RLHF, suggesting alternative reward modelling approaches are needed. Full code is available on GitHub.
To install the required packages, run the following command:
pip install -r requirements.txt
This repository allows training optimal RLHF agents for the Pendulum
, Lunar Lander
, and Bipedal Walker
environments
given some conflicting probability. The agents can then be evaluated, plotting the average evaluating reward per episode
and conduct permutation tests to compare the agents' performance.
Relevant files:
train_preference_comparisons.py
include the main methods to get the RLHF agents and train them.- We also implement a custom
ConflictingSyntheticGatherer
class, which is used to generate conflicting preferences. There are also tests in thetests
folder.
- We also implement a custom
train.py
includes the training loop for the agents.plot_results.py
plots the results of the training, stored in theresults
folder.Config.py
includes the configuration for the training process. Hyperparameters, etc.helpers.py
,environments.py
, andgraphs.py
includes many auxiliary functions to help with the training and evaluation process.
Note that the code logs the results in Weight & Biases, so you need to have an account and set up the API key.
This key should be stored in a Constants.py
file in the root directory:
API_WANDB_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
The agents' performance is evaluated using csv files of their mean evaluating reward per episode.
These files are named after the environment and must be stored in the results
folder.
The plot_results.py
script can be used to plot the results of the agents.
The script will plot the average evaluating reward per episode and
conduct permutation tests to compare the agents' performance.
It also includes helper methods to generate the csv files from wandb logs.
Please cite this repository if it was useful for your research:
@article{javi2024rlhf,
title={The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback},
subtitle={How can RLHF deal with possibly conflicting feedback?},
author={Paez Franco, Javier},
year={2024},
school={Delft University of Technology},
type={Bachelor Thesis},
url = {http://resolver.tudelft.nl/uuid:a7b37b44-4798-492e-822e-f1b7c347410b},
}