Deep Deterministic Policy Gradients (DDPG)

Continuous Control Project

Introduction

For this project, we use the Reacher environment.

In this environment, a double-jointed arm moves to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of the agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

Distributed Training

This project solves the environment using the first version of the Unity environment containing a single agent:

Solving the Environment

The task is episodic, and in order to solve the environment, the agent must average score of +30 over 100 consecutive episodes in the 1 agent environment.

Getting Started

Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Version 1: One (1) Agent
  - Linux: click here
  - Mac OSX: click here
  - Windows (32-bit): click here
  - Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link (version 1) or this link (version 2) to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)

I solved the environment using 1 agent approach For my environment which was windows based, I downloaded the first versions of the and stored them with the following folder name:

Reacher_01_Windows_x86_64

The code base has been tailored to advantage of both solutions. It simply deals with x-agents list,

1 agent solution as a 1-size length list

Make sure you have downloaded and installed Anaconda. You can download it from https://www.anaconda.com/distribution/
Now you can create your environment. Since this environment refers to a Udacity project for the Deep Reinforcement Learning Nanodegree, lets call our environment DRLND.

Linux or Mac:
```
conda create --name drlnd python=3.6
source activate drlnd
```
Windows:
```
conda create --name drlnd python=3.6 
activate drlnd
```
We will be working with pytorch version 0.4.0 (an early version), so make sure that you install this version of pytorch first by typing:
```
conda install pytorch=0.4.0 -c pytorch
```
Perform a minimal installation of the OpenAI Gym environment (see instructions here: https://github.com/openai/gym)
For the rest of the prerequisities please do type:
```
pip install .
```
The above line of code assumes that at the folder you are working, you have the setup.py which includes the UnityAgents and the requirements.txt file that contains other useful packages (that exist in that repository).

Create a Python execution backend for Jupyter for the drlnd environment

python -m ipykernel install --user --name drlnd --display-name "drlnd"

Now you are not only ready to use the UnityAgents evnironment, but the OpenAI Gym as well. You are all set to start playing with reinforcement learning environments!

Other useful utilities will also be installed if you follow these directions, including Jupyter Notebook, so consider the above installation guide as a complete guide to setup your RL environments!

Instructions

Run the notebooks

Continuous_Control_1_agent.ipynb for the 1 Agent Solution

Folow the instructions in the notebook and run accordingly

Report

There is a separate file named Report.ipynb which has a detailed explaination of the code and its working.

joypoddar/drlnd-continuous-control