/DQfD

An implementation of Deep Q-Learning from Demonstrations (DQfD) for playing Atari 2600 video games

Primary LanguagePythonMIT LicenseMIT

Deep Q-Learning from Demonstrations (DQfD)

This repository contains an implementation of the learning algorithm proposed in Deep Q-Learning from Demonstrations (Hester et al. 2018) for solving Atari 2600 video games using a combination of reinforcement learning and imitation learning techniques.

Note: The implementation is part of my Bachelor's Thesis Tiefes Q-Lernen mit Demonstrationen.

Table of Contents

Features

Getting Started

Installation

1. Clone the repository

In order to clone the repository, open your terminal, move to the directory in which you want to store the project and type

$ git clone https://github.com/felix-kerkhoff/DQfD.git

2. Create a virtual environment

The installation of the GPU-Version of TensorFlow with the proper Nvidia driver and Cuda libraries from source can be quite tricky. I recommend using Anaconda/Miniconda to create a virtual environment for the installation of the necessary packages, as conda will automatically install the right Cuda libraries. So type

$ conda create --name atari_env

to create an environment called atari_env. If you already have a working TensorFlow 2 installation, you can of course also use venv and pip to create the virtual environment and install the packages.

3. Install the required packages

To install the packages, we first have to activate the environment by typing:

$ conda activate atari_env

Then install the necessary packages specified in requirements.txt by using the following command in the directory of your project:

$ conda install --file requirements.txt -c conda-forge -c powerai

Note:

  • If you want to use pip for the installation, you will need to make the following changes to the requirements.txt file:

    • replace the line atari_py==0.2.6 by atari-py==0.2.6
    • replace the line opencv==4.4.0 by opencv-python==4.4.0
  • For being able to compile the Cython modules, make sure to have a proper C/C++ compiler installed. See the Cython Documentation for further information.

Usage

To see if everything works fine, I recommend training your first agent on the game Pong as this game needs the least training time. Therefor just run the following command in your terminal (in the directory of your project):

$ python pong_standard_experiment.py

You should be seeing good results after about 150,000 training steps which corresponds to about 15 minutes of computation time on my machine. By using n_step = 50 instead of n_step = 10 as the number of steps considered for the n-step loss, you can even speed up the process to get good results after less than 100,000 training steps or 10 minutes (see the experiment in the next section). Feel free to experiment with all the other parameters and games by changing them in the respective file.

Some Experiments

Different numbers n for the n-step loss in the game Pong

With the first experiment, we try to show how the use of multi-step losses can speed up the training process in the game Pong.

Ablations in the game Enduro

In the next experiment we investigate the influence of the different components of n-step Prioritized Dueling Double Deep Q-Learning using the example of the game Enduro. We will do this by leaving out exactly one of the components and keeping all other parameters unchanged.

Using demonstrations to learn Montezuma's Revenge

Due to very sparse rewards and the need of long-term planning, Montezuma's Revenge is known to be one of the most difficult Atari 2600 games to solve for deep reinforcement learning agents, such that most of them fail in this game. The use of human demonstrations might help to overcome this issue:

Note:

  • The figures show the number of steps (i.e. the number of decisions made by the agent) on the x-axis and the scores achieved during the training process on the y-axis. The learning curves are smoothed using the moving average over intervals of 50 episodes and the shaded areas correspond to the standard deviance within these intervals.
  • The learning curves were produced using the parameters (except of the ones that were considered in the experiments such as n_step in the first experiment) specified in the files pong_standard_experiment.py, enduro_standard_experiment.py and montezuma_demo_experiment.py.

Task List

License

This project is licensed under the terms of the MIT license.

Copyright (c) 2020 Felix Kerkhoff