This project solves the Unity ML-Agents Reacher environment using the Deep Deterministic Policy Gradient (DDPG). The code is written in PyTorch.
In the Reacher environment, a double-jointed arm can move to target locations. At each time step, a reward of +0.1 was earned when the agent's hand is in the target location. The state space has 33 dimensions, while the action space has 4 continuous actions (from -1 to 1), corresponding to torque applicable to two joints. Twenty identical agents are present in the environment.
The problem is considered solved with an average score of +30 (over all 20 agents) over 100 consecutive episodes.
The dependencies are provided inside the folder python
. This is adapted from the ML-Agents repository but also includes additional pip packages needed. The Linux headless version of the Unity Environment used in this project is in the folder Reacher_Linux_NoVis
. For other operating system, the environment files can be downloaded from:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
The Jupyter notebook Continuous_Control.ipynb
contains all the codes to install dependencies, train agent using DDPG, save model weights, and generate report plot. Saved weights from one successful training can be found in actor_checkpoint.pth
(actor/policy network) and critic_checkpoint.pth
(critic/value network).
Details of the implementation and results can be found in the report.