This is an implementation in Keras and OpenAI Gym of the Deep Q-Learning algorithm (often referred to as Deep Q-Network, or DQN) by Mnih et al. on the well known Atari games.
The repository has been forked from Daniel Grattola's implementation.
Rather than a pre-packaged tool to simply see the agent playing the game, this is a model that needs to be trained and fine tuned by hand and has more of an educational value. This code tries to replicate the experimental setup described in the original DeepMind paper.
Make sure to cite the paper by Mnih et al. if you use this code for your research:
@article{mnih2015human,
title={Human-level control through deep reinforcement learning},
author={Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A and Veness, Joel and Bellemare, Marc G and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K and Ostrovski, Georg and others},
journal={Nature},
volume={518},
number={7540},
pages={529--533},
year={2015},
publisher={Nature Research}
}
You can play with this repo on Colab
To run the script you'll need the following dependencies:
which should all be available through Pip.
No additional setup is needed, so simply clone the repo:
git clone https://gitlab.com/yashkotadia/deep-q-atari.git
cd deep-q-atari
A default training session can be run by typing:
python atari.py -t
which will train the model with the same parameters as described in
this Nature article,
on the MsPacmanDeterministic-v4
environment.
By running:
python atari.py -h
you'll see the options list. The possible options are:
-t, train
: train the agent;
-l, --load
: load the neural network weights from the given path;
-v, --video
: show video output;
-d, --debug
: run in debug mode (no output files);
--eval
: evaluate the agent;
-e, --environment
: name of the OpenAI Gym environment to use
(default: MsPacman-v0);
--minibatch-size
: number of sample to train the DQN at each update;
--replay-memory-size
: number of samples stored in the replay memory;
--target-network-update-freq
: frequency (number of frames) with which
the target DQN is updated;
--avg-val-computation-freq
: frequency (number of DQN updates) with which the
average reward and Q value are computed;
--discount-factor
: discount factor for the environment;
--update-freq
: frequency (number of steps) with which to train the DQN;
--learning-rate
: learning rate for the DQN;
--epsilon
: initial exploration rate for the agent;
--min-epsilon
: final exploration rate for the agent;
--epsilon-decrease
: rate at which to linearly decrease epsilon;
--replay-start-size
: minimum number of transitions (with fully random policy)
to store in the replay memory before starting training;
--initial-random-actions
: number of random actions to be performed by the
agent at the beginning of each episode;
--dropout
: dropout rate for the DQN;
--max-episodes
: maximum number of episodes that the agent can experience
before quitting;
--max-episode-length
: maximum number of steps in an episode;
--max-frames-number
: maximum number of frames for a run;
--test-freq
: frequency (number of episodes) with which to test the agent's
performance;
--validation-frames
: number of frames to test the model like in table 3 of the
paper
--test-states
: number of states on which to compute the average Q value;
The possible environments on which the agent can be trained are all the environments in the Atari gym package. A typical usage of this script on an headless server (e.g. EC2 instance) would look like this:
python atari.py -t -e BreakoutDeterministic-v4
If you want to see the actual game being played by the agent, simply add the -v
flag to the above command (note that this will obviously slow the collection of
samples.
You'll find some csv files in the output folder of the run (output/runYYYMMDD-hhmmss
)
which will contain raw data for the analysis of the agent's performance.
More specifically, the following files will be produced as output:
- training_info.csv: will contain the episode length and cumulative (non-clipped) reward of each training episode;
- evaluation_info.csv: will contain the episode length and cumulative (non-clipped) reward of each evaluation episode;
- training_history.csv: will contain the average loss and accuracy for each
training step, as returned by the
fit
method of Keras; - test_score_mean_q_info.csv: will contain the average score and Q-value
(computed over a number of held out random states defined by the
--test-states
flag) calculated at intervals of N DQN updates (where N is set by the--avg-val-computation-freq
flag); - log.txt: a text file with various information about the parameters of the run and the progress of the model;
- model_DQN.h5, model_DQN_target.h5: files containing the weights of the
DQN and target DQN (both files will be saved when the script quits or is killed
with
ctrl+c
). You can pass any of these files as argument with the--load
flag to initialize a new DQN with these weights (Note: the DQN architecture must be unchanged for this to work);
The output can also be visualized on the WandB project dashboard.