Teach a Quadcopter How to Fly!
An agent built using a reinforcement learning algorithm to train and fly a quadcopter. The project implements Deep Deterministic Policy Gradients (DDPG) algorithm.
Open and view the Project using the .zip
file provided or at my Github Repository
The project is also hosted on GitHub
The starter project can be downloaded from here
The project will be evaluated by a Udacity code reviewer according to the project rubric
You would require the following tools to develop and run the project:
- Clone the repository and navigate to the downloaded folder.
git clone https://github.com/udacity/RL-Quadcopter-2.git
cd RL-Quadcopter-2
- Create and activate a new environment.
conda create -n quadcop python=3.6 matplotlib numpy pandas
source activate quadcop
- Create an IPython kernel for the
quadcop
environment.
python -m ipykernel install --user --name quadcop --display-name "quadcop"
- Open the notebook.
jupyter notebook Quadcopter_Project.ipynb
-
Before running code, change the kernel to match the
quadcop
environment by using the drop-down menu (Kernel > Change kernel > quadcop). Then, follow the instructions in the notebook. -
You will likely need to install more pip packages to complete this project. Please curate the list of packages needed to run your project in the
requirements.txt
file in the repository.
To run the project:
- Activate the Conda or Python virtual environment, and then start the jupyter notebook as mentioned above. Open your browser and visit localhost:8888 (or the port indicated in the terminal), and you should see all of the contents of the project in
Quadcopter_Project.ipynb
notebook - After completing the development, press the
play
▶️ icon to start the execution of cells. The output will be visible right below each respective cells.
The notebook contains the following functions and configurations:
Replay Buffer
to store and recall experience tuplesDDPG: Actor (Policy) Model
for the copter which is meant to map states to actions.loss
function usingaction_gradients
andaction
DDPG: Critic (Value) Model
is meant to map (state, action) pairs to their Q-values- The final output of this model is the Q-value for any given (state, action) pair. However, we also need to compute the gradient of this Q-value with respect to the corresponding action vector, needed for training the actor model.
DDPG: Agent
is build by putting together the actor and policy models. Note that we will need two copies of each model - one local and one targetNoise Model
uses theOrnstein–Uhlenbeck Noise
process
Follow the instructions in the notebook; they will lead you through the project. You'll be editing the Quadcopter_Project.ipynb
file.
Once you're done with the app, stop it gracefully using the following command:
- Select
File -> Close and Halt
inside jupyter notebook - Press
Ctrl+c
in the cli - Deactivate conda environment using the following command:
>> conda deactivate generate-scripts
- Delete the environment if done with the project]
>> conda remove --name generate-scripts --all