This is the second project in the Udacity Deep Reinforcement Learning Nanodegree. The task is to train a double-jointed arm to target locations.
The environment implementation to train on is Reacher
In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. It is, the arm must follow the target as much as possible.
The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.
Follow the Udacity DRL ND dependencies instructions here
Be sure of install Unity ML-Agents, NumPy and PyTorch
Download a prebuilt simulator
Linux: click here Mac OSX: click here Windows (64-bit): click here
Linux: click here Mac OSX: click here Windows (64-bit): click here
Place this file in the same directory as this repo content.
Run the DDPG_Continuous_Control.ipynb
notebook using the drlnd kernel to train the DDPG agent.
Use ddpg
function to perform the training. This function returns a dictionary containing relevant internal variables that can be fed again to this function to continue training where it was left. Play around with this, change hyper parameters in between training runs to train your own intuition.
This problem is said to be solved when you hit a 30
score (see an example Reward Plot).
Once trained the model weights will be saved in the same directory in the files checkpoint_actor.pth
and checkpint_critic.pth
.
The model weights are used by the run_agent.ipynb
notebook against the simulator.