nd893 Deep Reinforcement Learning - Project 3 - Collaboration and Competition
Work on the Tennis environment and solve it using RLdeep learning based models for multi-agent continuous controls and actions.
In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.
I wrote this code on the online workspace @ udacity.
Still try to figure out how to build this environment on my machines.
(this is referred to online resource, not fully tested yet)
The environment can be downloaded from one of the links below for all operating systems:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
- For AWS: To train the agent on AWS (without enabled virtual screen), use this link to obtain the "headless" version of the environment. The agent can not be watched without a virtual screen, but can be trained. (To watch the agent, one can follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)
Run Tennis.ipynb
for step-by-step details.
model.py
contains neural network classes for Actor and Critic functional approximatior.
MADDPG_agent.py
is the implementation of Multi Agent Deep Deterministic Policy Gradients (MADDPG) paper.
In this model every agent itself is modeled as a Deep Deterministic Policy Gradient (DDPG) agent paper
Please refer to REPORT.md
TBD, need more time