This project uses a Deep Q Network (DQN) to train an agent to navigate a 3D environment, specifically a variant of the Banana Collector environment. This project is being done as part of the Udacity Deep Reinforcement Learning Nanodegree, a four month course that I am taking.
This environment is a simplified version of the ML Agents example one. It has a single agent, a smaller state space and a discrete action space. The environment is a open 3D space that the agent will need to navigate. The goal is to collect as many good (yellow) bananas as possible while avoiding bad (blue) ones. A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana.
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:
- 0 - move forward.
- 1 - move backward.
- 2 - turn left.
- 3 - turn right.
The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.
- Make sure you having a working version of Anaconda on your system.
Clone this repo using git clone https://github.com/danielnbarbosa/drlnd_navigation.git
.
Create an anaconda environment that contains all the required dependencies to run the project.
Mac:
conda create --name drlnd_navigation python=3.6
source activate drlnd_navigation
conda install -y python.app
conda install -y pytorch -c pytorch
pip install torchsummary unityagents
Windows:
conda create --name drlnd_navigation python=3.6
activate drlnd_navigation
conda install -y pytorch -c pytorch
pip install torchsummary unityagents
You will also need to install the pre-built Unity environment, you will NOT need to install Unity itself. Select the appropriate file for your operating system:
Download the file into the top level directory of this repo and unzip it.
To train the agent run python bananas.py
. This will fire up the Unity environment and output live training statistics to the command line. When training is finished you'll have a saved model in checkpoints/solved.pth
and see some graphs that help visualize the agent's learning progress. It should take the agent around 150 - 250 episodes to solve the environment.
To watch your trained agent interact with the environment run python bananas.py --render
. This will load the saved weights from a checkpoint file. A previously trained model is included in this repo.
Mac users may need to execute python using pythonw
instead of python
due to matplotlib requiring a framework build of python. More details here.
Feel to experiment with modifying the hyperparameters to see how it affects training.
See the report for more insight on how I arrived at the current hyperparameters.