This code is based on the work in On the Utility of Learning about Humans for Human-AI Coordination.
To play the game with trained agents, you can use Overcooked-Demo.
For more information about the Overcooked-AI environment, check out this repo.
- Human-Aware Reinforcement Learning
- Contents
- Installation
- Testing
- Repo Structure Overview
- Usage
- Troubleshooting
- Reproducing Results
When cloning the repository, make sure you also clone the submodules (this implementation is linked to specific commits of the submodules, and will mostly not work with more recent ones):
$ git clone --recursive https://github.com/HumanCompatibleAI/human_aware_rl.git
If you want to clone a specific branch with its submodules, use:
$ git clone --single-branch --branch BRANCH_NAME --recursive https://github.com/HumanCompatibleAI/human_aware_rl.git
For Ubuntu 18.04, follow the direction here
The only difference being the very last step.
Instead of running
$ sudo apt-get install cuda
Please run
$ sudo apt-get install cuda-libraries-10-0
$ sudo apt-get install cuda-10-0
Create a new conda environment and run the install script as before
Optional Conda Installation for 18.04
$ conda create -n harl_rllib python=3.7
$ conda activate harl_rllib
(harl_rllib) $ ./install.sh
Finally, install the latest stable version of tensorflow compatible with rllib
(harl_rllib) $ pip install tensorflow==2.0.2
Or, if working with gpus, install a version of tensorflow 2.. and cuDNN that is compatible with the available Cuda drivers. The following example works for Cuda 10.0.0. You can verify what version of Cuda is installed by running nvcc --version
. For a full list of driver compatibility, refer here
(harl_rllib) $ pip install tensorflow-gpu==2.0.0
(harl_rllib) $ conda install -c anaconda cudnn=7.6.0
Your virtual environment should now be configured to run the rllib training code. Verify it by running the following command
(harl_rllib) $ python -c "from ray import rllib"
Note: if you ever get an import error, please first check if you activated the conda env
If set-up was successful, all unit tests and local reproducibility tests should pass. They can be run as follows
You can run all the tests with
(harl_rllib) $ ./run_tests.sh
Highest level integration tests that combine self play, bc training, and ppo_bc training
(harl_rllib) $ cd human_aware_rl/ppo
(harl_rllib) human_aware_rl/ppo $ python ppo_rllib_test.py
All tests involving creation, training, and saving of bc models. No dependency on rllib
(harl_rllib) $ cd imitation
(harl_rllib) imitation $ python behavior_cloning_tf2_test.py
Tests rllib environments and models, as well as various utility functions. Does not actually test rllib training
(harl_rllib) $ cd rllib
(harl_rllib) rllib $ python tests.py
You should see all tests passing.
Note: the tests are broken up into separate files because they rely on different tensorflow execution states (i.e. the bc tests run tf in eager mode, while rllib requires tensorflow to be running symbollically). Going forward, it would probably be best to standardize the tensorflow execution state, or re-write the code such that it is robust to execution state.
ppo/
:
ppo_rllib.py
: Primary module where code for training a PPO agent resides. This includes an rllib compatible wrapper onOvercookedEnv
, utilities for converting rllibPolicy
classes to OvercookedAgent
s, as well as utility functions and callbacksppo_rllib_client.py
Driver code for configuing and launching the training of an agent. More details about usage belowppo_rllib_from_params_client.py
: train one agent with PPO in Overcooked with variable-MDPsppo_rllib_test.py
Reproducibility tests for local sanity checks
rllib/
:
rllib.py
: rllib agent and training utils that utilize Overcooked APIsutils.py
: utils for the abovetests.py
: preliminary tests for the above
imitation/
:
behavior_cloning_tf2.py
: Module for training, saving, and loading a BC modelbehavior_cloning_tf2_test.py
: Contains basic reproducibility tests as well as unit tests for the various components of the bc module.
human/
:
process_data.py
script to process human data in specific formats to be used by DRL algorithmsdata_processing_utils.py
utils for the above
utils.py
: utils for the repo
Before proceeding, it is important to note that there are two primary groups of hyperparameter defaults, local
and production
. Which is selected is controlled by the RUN_ENV
environment variable, which defaults to production
. In order to use local hyperparameters, run
$ export RUN_ENV=local
Training of agents is done through the ppo_rllib_client.py
script. It has the following usage:
ppo_rllib_client.py [with [<param_0>=<argument_0>] ... ]
For example, the following snippet trains a self play ppo agent on seed 1, 2, and 3, with learning rate 1e-3
, on the "cramped_room"
layout for 5
iterations without using any gpus. The rest of the parameters are left to their defaults
(harl_rllib) ppo $ python ppo_rllib_client.py with seeds="[1, 2, 3] lr=1e-3 layout_name=cramped_room num_training_iters=5 num_gpus=0 experiment_name="my_agent"
For a complete list of all hyperparameters as well as their local and production defaults, refer to the my_config
section of ppo_rllib_client.py
Training results and checkpoints are stored in a directory called ~/ray_results/my_agent_<seed>_<timestamp>
. You can visualize the results using tensorboard
(harl_rllib) $ cd ~/ray_results
(harl_rllib) ray_results $ tensorboard --logdir .
Many tensorflow errors are caused by the tensorflow state of execution. For example, if you get an error similar to
ValueError: Could not find matching function to call loaded from the SavedModel. Got:
Positional arguments (1 total):
* Tensor("inputs:0", shape=(1, 62), dtype=float64)
Keyword arguments: {}
or
NotImplementedError: Cannot convert a symbolic Tensor (model_1/logits/BiasAdd:0) to a numpy array.
or
TypeError: Variable is unhashable. Instead, use tensor.ref() as the key.
It is likely because the code you are running relies on tensorflow executing symbolically (or eagerly) and it is executing eagerly (or symbolically)
This can be fixed by either changing the order of imports. This is because import tensorflow as tf
sets eager execution to true, while any rllib
import disables eager execution. Once the execution state has been set, it cannot be changed. For example, if you require eager execution, make sure import tensorflow as tf
comes BEFORE from ray import rllib
and vise versa.
If you encounter
ModuleNotFoundError: No module named 'human_aware_rl.data_dir'
, please run
./run_tests.sh
to initiate those variables
The specific results in that paper were obtained using code that is no longer in the master branch. If you are interested in reproducing results, please check out this and follow the install instructions there.