To train a Deep Q Learning agent for the Banana collection game in UnityML agents
Reward: The agent geta a reward of +1 for collecting a yellow banana, and a reward of -1 for collecting a blue banana. Thus, the goal of the agent is to maximise the long term reward by collecting as many yellow bananas as possible while avoiding blue bananas.
State: Every state the agent is in can be represented by a vector that has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions.
Actions: The agent can take four different actions namely,:
0 - move forward. 1 - move backward. 2 - turn left. 3 - turn right.
Termination: The agent terminated after taking 300 time steps.
In order to consider the environment has been solved, the agent must get an average score of +13 over 100 consecutive episodes.
Clone the repository -
git clone https://github.com/abhisheksgumadi/deep-q-learning.git .
cd deep-q-learning
Install jupyter notebook with the command
pip install jupyter
Then, open the Navigation.ipynb notebook
jupyter notebook Navigation.ipynb
Download the Banana environment for Unity at here
The code consists of the following modules
Navigation.ipynb - the main notebook
agent.py - defines the Agent that is being trained
model.py - defines the PyTorch model for the Deep Q Network
checkpoint.pth - is the final trained agent that has been trained to get atleast a reward of 13 points over 100 consecutive episodes
Please follow the code in Navigation.ipynb to train the agent
The average reward collection over 100 episodes plotted in a graph below. It shows the average reward on the Y axis for every point on the X axis representing any 100 consecutive episodes.
I have recorded a video of the trained agent in action. To watch the video please click on the below image.