Banana Collecting Agent - Project-1 Navigation

This project is part of Udacity's Deep Reinforcement Learning Nanodegree and is called Project-1 Navigation. This model was trained on a MacBook Air 2017 with 8GB RAM and Intel core i5 processor.

Description

The project involves an agent that is tasked to collect as much yellow bananas as possible ignoring the blue bananas. The environment is created using Unity and can be found in Unity ML Agents. On collecting a yellow banana the agent gets a reward of +1 and on collecting a blue banana the agnet is given a reward (or punishment) of -1.

The state space has 37 dimensions and the agent can perform 4 different actions:

0 - move forward

1 - move backward

2 - turn left

3 - turn right

The agent's task is episodic and is solved when the agent gets atleast +13 over consecutive 100 episodes.

For this task I used a Deep Q Network which takes as input the current 37 dimensional state and passed through two (2) layers of multi layered perceptron with ReLU activation followed by an output layer which gives the action-values for all the possible actions.

Demo

Demonstration of the trained agent

The thing I truly loved is how the agent knows it is stuck and chooses action back to get out of the state ❤️.

Steps to run

  1. Clone the repository:

    user@programer:~$ git clone https://github.com/frankhart2018/banana-collecting-agent
  2. Install the requirements:

    user@programmer:~$ pip install requirements.txt
  3. Download your OS specific unity environment:
  4. Update the banana app location according to your OS in the mentioned placed.
  5. Unzip the downloaded environment file

  6. If you prefer using jupyter notebook then launch the jupyter notebook instance:

    user@programmer:~$ jupyter-notebook

    ➡️ For re-training the agent use Banana Collecting Agent.ipynb

    ➡️ For testing the agent use Banana Agent Tester.ipynb

    In case you like to run a python script use:

    ➡️ For re-training the agent type:

    user@programmer:~$ python train.py

    ➡️ For testing the agent use:

    user@programmer:~$ python test.py

Technologies used

  1. Unity ML Agents
  2. PyTorch
  3. NumPy
  4. Matplotlib

Algorithms used

  1. Multi Layered Perceptron.
  2. Deep Q-Network. To learn more about this algorithm you can read the original paper by DeepMind: Human-level control through deep reinforcement learning

Model description

The Q-Network has three dense (or fully connected layers). The first two layers have 64 nodes activated with ReLU activation function. The final (output layer) has 4 nodes and is activated with linear activation (or no activation at all). This network takes in as input the 37 dimensional current state and gives as output 4 action-values corresponding to the possible actions that the agent can take.

The neural network used Adam optimizer and Mean Squared Error (MSE) as the loss function.

The following image provides a pictorial representation of the Q-Network model:

Pictorial representation of Q-Network

The following image provides the plot for score v/s episode number:

Plot for score v/s episode number

Hyperparameters used

Hyperparameter Value Description
Buffer size 100000 Maximum size of the replay buffer
Batch size 64 Batch size for sampling from replay buffer
Gamma (γ) 0.99 Discount factor for calculating return
Tau (τ) 0.001 Hyperparameter for soft update of target parameters
Learning Rate (α) 0.0005 Learning rate for the neural networks
Update Every (C) 4 Number of time steps after which soft update is performed
Epsilon (ε) 1.0 For epsilon-greedy action selection
Epsilon decay rate 0.995 Rate by which epsilon decays after every episode
Epsilon minimum 0.01 The minimum value of epsilon