Project 2: Continuous Control

Table of Contents

Usage
Getting Started
Contributing
License
Contact

Introduction

This project is provided by Udacity and comes from the Unity Reacher environment.

The environment presents a free empty space where an arm robot lays. The arm is double-jointed arm able to move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. The location is indicated by a green balloon that moves around the arm. The goal is to maintain the position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1 (continuous space).

For this project, there are two separate versions of the Unity environment:

The first version contains a single agent.
The second version contains 20 identical agents, each with its own copy of the environment.

One Agent

The task is episodic, and in order to solve the environment, the agent must get an average score of +30 over 100 consecutive episodes. It is located in the folder one_agent. The environment was close to be solved but not completely (either it was unstable or did not pass the 30 value). However, it can be used to create one stable agent.

Multiagent

It is the default folder. Here there is a total of 20 agents, who must get an average score of +30 (over 100 consecutive episodes, and over all agents). Specifically,

After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 20 (potentially different) scores. We then take the average of these 20 scores.
This yields an average score for each episode (where the average is over all 20 agents).

The environment is considered solved, when the average (over 100 episodes) of those average scores is at least +30.

Getting Started

Clone the repo

git clone https://github.com/josemiserra/ddpg_reacher

If you don't have Anaconda or Miniconda installed, go to Miniconda and install Miniconda in your computer (miniconda is a lightweight version of the Anaconda python environment).
It is recommended that you install your own environment with Conda. Follow the instructions here: Conda environment. After that, open an anaconda command prompt or a prompt, and activate your environment.

activate your-environment

Install the packages present in requirements.txt

pip install requirements.txt
pip install mlagents

If you want to use pytorch with CUDA, it is recommmended to go to https://pytorch.org/get-started/locally/ and install pytorch following the instructions there, according to your CUDA installation.
Move into the folder of the project, and run jupyter notebook.
```
 cd jupyter notebook
```
Alternatively you can execute from the python console using the execute_train.py for training, execute_test.py for testing the result of the network.
```
 python execute_train.py
 python execute_test.py
```

If you have a different OS than Windows 64, you can download the environment from one of the links below (provided by Udacity):

Version 1: One (1) Agent - Linux: click here - Mac OSX: click here - Windows (32-bit): click here - Windows (64-bit): click here
Version 2: Twenty (20) Agents - Linux: click here - Mac OSX: click here - Windows (32-bit): click here - Windows (64-bit): click here

(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

Instructions

There are two environments, independent: - In the main folder is the Multi-Agent. After running jupyter notebook in the folder, run the Continuous_Control - Multiagent.ipynb. If you want to run it by console, use the execute_train.py file. - In the folder one_agent, you will find the same environment but with one agent. After running the jupyter notebook, follow the instructions in Continuous_Control.ipynb to get started with training and testing. You can also train using the one_agent\execute_train.py

You will not be able to run both environments simultaneously. Just run one or the other.

For more info about the algorithms and tests done, read the file Report.md.

License

Distributed under the MIT License from Udacity Nanodegree. See LICENSE for more information.

Contact

Jose Miguel Serra Lleti - serrajosemi@gmail.com

Project Link: https://github.com/josemiserra/ddpg_reacher