Project 2: done as part of the Udacity Deep Reinforcement Learning Nanodegree. The objective of this project is to create a Deep Deterministic Policy Gradient Learning agent that is able to maximize the reward in the Unity ML-Agents based Reacher continuous environment.
The environment has an Agent with a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of the agent is to maintain its position at the target location for as many time steps as possible.
The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.
The environment is considered to be solved when the Agent gets an average score of +30 over 100 consecutive episodes.
Note: This project uses a simulator provided by Udacity which is similar but not identical to the Reacher
environment on the Unity ML-Agents GitHub page.
-
Install project dependencies by following the instructions mentioned in the Installation_Guide.md.
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
-
Version 1: One (1) Agent:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Version 2: Twenty (20) Agents: (Note that the project has been implemented using Version 1)
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link (version 1) or this link (version 2) to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)
-
-
Place the file in
/data
directory, and unzip the file.
Following are the steps to train your agent:
- Clone this github repository:
git clone https://github.com/anubhavshrimal/Continuous_Control_Udacity_DRLND_P2.git cd Continuous_Control_Udacity_DRLND_P2/
- Activate the conda environment where you installed the dependencies and open jupyter notebooks.
conda activate drlnd jupyter notebook
- Open
Continuous_Control.ipynb
on your browser and run all the cells of the notebook.
checkpoint_actor.pth
andcheckpoint_critic.pth
are the pre-trained model weights for the Agent, which can be used to further train the Agent or to see how the trained agent performs over the environmentContinuous_Control.ipynb
is the ipython notebook which trains the Agent in the reacher environmentddpg
folder contains the implementation for theAgent
and theactor
,critic
models.
The algorithm and hyper-parameter details are mentioned in Report.md.