/BananaCollector_DoubleQLearning

An implementation of DoubleQLearning to solve Unity's Banana Environment

Primary LanguageJupyter NotebookMIT LicenseMIT

Banana Collector

Unity's Banana Collector Environment is an environment in which an agent must collect as many yellow bananas (+1) as possible while avoiding blue bananas (-1).

The agent interacts with the environment via the following:

  • It is fed observations of the current state via a vector of 37 elements
  • It can choose to make any of 4 actions: (Left, Forward, Right or Back)

Banana Collector Environment

Sample image taken from: https://github.com/udacity/deep-reinforcement-learning/tree/master/p1_navigation

This repository trains an agent to attain an average score (over 100 episodes) of at least 13. It trains the agent using the Double DQN Reinforcement Learning algorithm.

Prerequisites

  • Anaconda

  • Python 3.6

  • A conda environment created as follows

    • Linux or Mac:
    conda create --name drlnd python=3.6
    source activate drlnd 
    
    • Windows
    conda create --name drlnd python=3.6 
    activate drlnd
    
  • Required dependencies

git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .

Getting Started

  1. git clone https://github.com/JoshVarty/BananaCollector_DoubleQLearning.git

  2. cd BananaCollector_DoubleQLearning

  3. Download Unity Banana Collector Enviroment:

  4. Unzip to git directory

  5. jupyter notebook

  6. You can train your own agent via DQN.ipynb or watch a single episode of the pre-trained network via Visualization.ipynb

Results

In my experience the agent achieves an average score of 13 after ~400 episodes of training:

image

A sample run generated from Visualization.ipynb

Notes

  • Only tested on Ubuntu 18.04
  • Details of the learning algorithm and chosen architecture may be found in Report.md