
Really simple implementation of DQN in pytorch for gym environments

Primary LanguagePython

Simple implementation of a DQN and Advantage Critic

Simple implementation of DQN and Advantage-Critic in pytorch for gym environments. The ICM work is slightly rough as this was just used as a playground to test it for other work. The code needs to be refactored to remove the redundunt methods to accomodate the ICM as the majority of work was not written planning for this.


  • agent.py - contains code for agents and models
  • utils.py - contains code for replay memory
  • environment.py - contains code for training and evaluation agents


To use, simply run the environment.py script using the arguments given below e.g

python environment.py -ud


usage: environment.py [-h] [--gym_env GYM_ENV] [--hidden_size HIDDEN_SIZE] [--gamma GAMMA] [--learning_rate LEARNING_RATE]
                     [--epsilon_start EPSILON_START] [--epsilon_end EPSILON_END] [--epsilon_anneal EPSILON_ANNEAL]
                     [--batch_size BATCH_SIZE] [--replay_memory_size REPLAY_MEMORY_SIZE] [--use_PER] [--use_DQN]
                     [--START_RENDERING START_RENDERING] [--update_frequency UPDATE_FREQUENCY]

Train a agent on gym environments

optional arguments:
  -h, --help            show this help message and exit
  --gym_env GYM_ENV, -g GYM_ENV
                        The name of the gym environment
  --hidden_size HIDDEN_SIZE, -hs HIDDEN_SIZE
                        size of the hidden layer
  --gamma GAMMA, -gm GAMMA
                        The discount factor used by the agent
  --learning_rate LEARNING_RATE, -lr LEARNING_RATE
                        The learning rate used by the optimizer
  --epsilon_start EPSILON_START, -es EPSILON_START
                        The starting value for epsilon in epsilon-greedy
  --epsilon_end EPSILON_END, -ee EPSILON_END
                        The ending value for epsilon in epsilon-greedy
  --epsilon_anneal EPSILON_ANNEAL, -en EPSILON_ANNEAL
                        The number of steps to which the epsilon anneals down
  --batch_size BATCH_SIZE, -bs BATCH_SIZE
                        Batch size to use in DQN
  --replay_memory_size REPLAY_MEMORY_SIZE, -re REPLAY_MEMORY_SIZE
                        Size of replay memory
  --use_PER, -up        Use prioritised replay memory in DQN
  --use_DQN, -ud        Use DQN agent instead of advantage-critic
                        The max number of steps per episode
                        The number of episodes to train
                        The number of episodes to train before rendering - used for training speed up
                        The number of steps per updating target DQN