
Tiny implementation of Deep-Q Network with Tensorflow

Human-Level Control through Deep Reinforcement Learning

This code is the tiny Tensorflow implementation of Deep-Q Network Human-Level Control through Deep Reinforcement Learning.

I implemented this code based on two existing github repos:

  • tiny-dqn for tiny implementation with Tensorflow
  • DQN-tensorflow for replay memory, preprocessing, and parameter settings

This implementation contains:

  • Deep Q-network and Q-learning
  • Random start game
  • Experience replay memory
    • to reduce the correlations between consecutive updates
  • Network for Q-learning targets are fixed for intervals
    • to reduce the correlations between target and predicted Q-values
  • Use Huber loss instead of clipping the gradients of mean-squared-error (MSE) loss (different from the paper)
    • to improve the stability of training
  • Reward clipping to -1 and +1

So far, I only tested this code with the Breakout-v0.


python main.py -v


python main.py --test --render


  • For academic and non-commercial use only
  • Apache License 2.0