/RobotArm_Continuous_Control

My solution to Project 2 - Continuous Control

Primary LanguagePython

Continuous Control

Train a Set of Robotic Arms

Introduction

For this project, we will work with the Reacher environment and solve it using RL models for continuous actions/controls.

Trained Agent

In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of the agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

Distributed Training

For this project, can be done with two separate versions of the Unity environment:

  • The first version contains a single agent.
  • The second version contains 20 identical agents, each with its own copy of the environment.

In this repository we solve version 1 with a single agent.

Solving the Environment

  • Set-up: Double-jointed arm which can move to target locations.
  • Goal: The agents must move it's hand to the goal location, and keep it there.
  • Agents: The environment contains 10 agent linked to a single Brain.
    • The provided Udacity agent versions are Single Agent or 20-Agents
  • Agent Reward Function (independent):
    • +0.1 Each step agent's hand is in goal location.
  • Brains: One Brain with the following observation/action space.
    • Vector Observation space: 26 variables corresponding to position, rotation, velocity, and angular velocities of the two arm Rigidbodies.
    • Vector Action space: (Continuous) Size of 4, corresponding to torque applicable to two joints.
    • Visual Observations: None.
  • Reset Parameters: Two, corresponding to goal size, and goal movement speed.
  • Benchmark Mean Reward: 30

The task is episodic, and in order to solve the environment, the agent must get an average score of +30 over 100 consecutive episodes.

Setting up the environment

The environment can be downloaded from one of the links below for all operating systems

Approach and solution

The notebook Continuous_Control.ipynb contains the code to set up the environment and the outer episode iteration to solve the reinforcement problem. Our solution uses a Deep Deterministic Policy Gradient approach (only standard feedforward layers) with experience replay, see this paper.

The agent, the deep Q-Network and memory buffer are implemented in the file ddpg_agent.py. The deep learning architectures for both actor and critic are defined in model.py.