Different from single agent model, agents in this project should be capable to collaborate to max both scores as much as possible.
a 2D tennis environment.
In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play. And this is a collaboration model.
The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Number of stacked vector observation is 3.
a single observation s=[x_ball, y_ball, vx_ball, vy_ball, x_racket, y_racket, vx_racket, vy_racket]. need to confirm the order and axis setup.
Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.
a single action a=[move_horizon, move_verticle]. all actions are between -1 and 1.
The task is episodic, and in order to solve the environment, your agents must get an average score of +0.5 (over 100 consecutive episodes, after taking the maximum over both agents). Specifically,
- After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 2 (potentially different) scores. We then take the maximum of these 2 scores.
- This yields a single score for each episode.
The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Place the file in the root folder of GitHub repository, and unzip (or decompress) the file.
-
The environment was modified by Udacity, and it is not identical to original unity tennis environment.
-
train script
python train_tennis.py
-
evaluate script
python eval_tennis.py