Train an agent with the DQN algorithm to navigate a virtual world and collect as many yellow bananas as possible while avoiding blue bananas.
Reward: of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas.
State: has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions.
Excerpt from cf. iandanforth
The state space has 37 dimensions and contains the agent's velocity,
along with ray-based perception of objects around agent's forward direction.
Ray Perception (35)
7 rays projecting from the agent at the following angles (and returned in this order):
[20, 90, 160, 45, 135, 70, 110] # 90 is directly in front of the agent
Ray (5)
Each ray is projected into the scene.
If it encounters one of four detectable objects the value at that position in the array is set to 1.
Finally there is a distance measure which is a fraction of the ray length.
[Banana, Wall, BadBanana, Agent, Distance]
example
[0, 1, 1, 0, 0.2]
There is a BadBanana detected 20% of the way along the ray and a wall behind it.
Velocity of Agent (2)
Left/right velocity (usually near 0)
Forward/backward velocity (0-11.2)
Actions: are four and discrete, corresponding to:
0
- move forward.1
- move backward.2
- turn left.3
- turn right.
Termination: after 300 time steps.
The task is discounted and episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.
- Clone the repository
git clone https://github.com/plopd/navigation.git
cd navigation
-
Create and activate a new environment with Python 3.6.
- Linux or Mac:
conda create --name navigation python=3.6 source activate navigation
- Windows:
conda create --name navigation python=3.6 activate navigation
-
Clone and install the openai-gym repository
cd ..
git clone https://github.com/openai/gym.git
cd gym
pip install -e .
- Create an IPython kernel for the
navigation
environment.
python -m ipykernel install --user --name navigation --display-name "navigation"
- Start (local) jupyter notebook server
cd ../navigation
jupyter-notebook
2. Download the Unity Environment
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Place the file in the root of this repo, and unzip (or decompress) the file.
The code is structured as follows:
checkpoints
- holds checkpoints of models for the agent
results
- camera-ready graphs and figures highlighting the results of the training
src
- agent and models used
utils
- useful code reusable for different agents (e.g. replay buffer)
Navigation.ipynb
- tutorial notebook to help users go through the training and testing pipeline
REPORT.md
- outlines details on the algorithms used to train the agent
Follow the instructions in Navigation.ipynb
to get started with training and/or watching a smart agent.