Continuous control using Policy-based Reinforcement Learning

Introduction

For this project, I am using slightly modified versions of Unity Reacher environment. See the animation with trained agent in comparison to "dumb" agent below

Trained Agent:

Random Agent

In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal is to maintain arm's position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1. Task is considered to be solved if the average cumulative reward in the last 100 episodes is greater or equal 30.

Getting Started

Note: This guide was only tested for macOS with pyenv for python versions management.

Please follow these steps to be able to run this project:

Install build tools (such as C++ compiler and etc.) by installing Xcode and then Xcode command-line tools following one of the various guides .
Install dependencies. It is highly recommended to install all dependencies in virtual environment (see guide).
- Install Unity ML-Agents Toolkit following instruction from this page (official GitHub of Unity ML-Agents Toolkit). It is very likely that most of you will only need to install mlagents and unityagents packages with the following command:
```
pip install mlagents unityagents
```
  It is highly recommended to use Python not higher then 3.7, because TesnsorFlow (one of the dependency for mlagents) is only compatible with Python 3.7).
- Install PyTorch with
```
pip insall torch torchvision
```
  Please see official installation guide for more information.
Alternatively, it is possible to install required dependencies using requirements.txt. To do that jsut run the following command in your terminal (preferably in project's virtual environment):
```
pip install -r requirements.txt
```
Please note: this method is a bit inefficient and has some packages that are not really used in this project. In fact this is a "R&D" environment for experiments and testing.
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
Place the file in the root of the project repository.
Run main.py from terminal with
```
python main.py
```
or simply run main.py in your IDE.
Run visualize.py to see intelligent agent with
```
python vizualize.py
```
or, again, simply run this file in you IDE.

Technical report

Check out technical report for implementation details.

stokhos/continuous-control

Continuous control using Policy-based Reinforcement Learning

Introduction

Getting Started

Technical report