udacity nd893 project 3

nd893 Deep Reinforcement Learning - Project 3 - Collaboration and Competition

Project Detail

Work on the Tennis environment and solve it using RLdeep learning based models for multi-agent continuous controls and actions.

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

Coding

I wrote this code on the online workspace @ udacity.

Still try to figure out how to build this environment on my machines.

Installation

(this is referred to online resource, not fully tested yet)

The environment can be downloaded from one of the links below for all operating systems:

Linux: click here
Mac OSX: click here
Windows (32-bit): click here
Windows (64-bit): click here
For AWS: To train the agent on AWS (without enabled virtual screen), use this link to obtain the "headless" version of the environment. The agent can not be watched without a virtual screen, but can be trained. (To watch the agent, one can follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)

Run Training

Run Tennis.ipynb for step-by-step details.

model.py contains neural network classes for Actor and Critic functional approximatior.

MADDPG_agent.py is the implementation of Multi Agent Deep Deterministic Policy Gradients (MADDPG) paper.

In this model every agent itself is modeled as a Deep Deterministic Policy Gradient (DDPG) agent paper

Report

Please refer to REPORT.md

Future Work

TBD, need more time

popovkai/udacity_nd893_DRL_project_3