This repository contains the code for the paper Learning to Fly in Seconds (arXiv, IEEE Xplore). It allows to train end-to-end control policies using deep reinforcement learning. The training is done in simulation and is finished within seconds on a consumer-grade laptop. The trained policies generalize and can be deployed on real quadrotors:
The main dependency is the RLtools deep reinforcement learning library.
We provide a pre-built Docker image with a simple web interface that can be executed using a single command (given that Docker is already installed on your machine):
docker run -it --rm -p 8000:8000 arpllab/learning_to_fly
After the container is running, navigate to https://0.0.0.0:8000 and you should see something like (after starting the training):
Note that to make this Docker image compatible with a broad range of CPUs, some optimizations have been turned off. For full speed we recommend a Native installation.
With the following instructions you can also easily build the Docker image yourself. If you want to run the code on bare metal jump Native installation.
First, install Docker on your machine. Then move to the original directory learning_to_fly
and build the Docker image:
docker build -t arpllab/learning_to_fly .
If desired you can also build the container for building the firmware:
docker build -t arpllab/learning_to_fly_build_firmware -f Dockerfile_build_firmware .
After that you can run it using e.g.:
docker run -it --rm -p 8000:8000 arpllab/learning_to_fly
This will open the port 8000
for the UI of the training program and run it inside the container.
Navigate to https://0.0.0.0:8000 with your browser, and you should see something like in the screenshot above (after starting the training).
The training UI configuration does not log data by default. If you want to inspect the training data run:
docker run -it --rm -p 6006:6006 arpllab/learning_to_fly training_headless
Navigate to https://0.0.0.0:6006 with your browser to investigate the Tensorboard logs.
If you would like to benchmark the training speed you can use:
docker run -it --rm arpllab/learning_to_fly training_benchmark
This is the fastest configuration, without logging, UI, checkpointing etc.
Clone this repository:
git clone https://github.com/arplaboratory/learning-to-fly learning_to_fly
cd learning_to_fly
Then instantiate the RLtools
submodule:
git submodule update --init -- external/rl_tools
cd external/rl_tools
Then instantiate some dependencies of RLtools
(for conveniences like checkpointing, Tensorboard logging, testing, etc.):
git submodule update --init -- external/cli11 external/highfive external/json/ external/tensorboard tests/lib/googletest/
sudo apt update && sudo apt install libhdf5-dev libopenblas-dev protobuf-compiler libprotobuf-dev libboost-all-dev
As an alternative to openblas you can also install Intel MKL which in our experience is significantly faster than OpenBLAS.
brew install hdf5 protobuf boost
Please make sure that brew
links the libraries correctly. If not you might have to link e.g. protobuf
manually using brew link protobuf
.
Going back to the main directory (learning_to_fly
), we can now configure the build of the code:
cd ../../
mkdir build
cd build
- Ubuntu + OpenBLAS:
cmake .. -DCMAKE_BUILD_TYPE=Release -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS:BOOL=ON
- Ubuntu + MKL:
cmake .. -DCMAKE_BUILD_TYPE=Release -DRL_TOOLS_BACKEND_ENABLE_MKL:BOOL=ON
- macOS (tested on Sonoma):
cmake .. -DCMAKE_BUILD_TYPE=Release
Finally, we can build the targets:
cmake --build . -j8
After successfully building the targets, we can run the code (in the original directory learning_to_fly
):
cd ..
./build/src/training_headless
While this is running, you should be able to see training metrics using Tensorboard
If not already installed:
python3 -m pip install tensorboard
Then from the original directory learning_to_fly
:
tensorboard --logdir=logs
To run the training with the UI, we download the JavaScript dependencies in the form of the two files three.module.js
and OrbitControls.js
:
cd src/ui
./get_dependencies.sh
After that we can execute the UI binary from the root folder:
cd ../../
./build/src/ui 0.0.0.0 8000
Now you should be able to navigate to http://0.0.0.0:8000 in your browser and start the training.
To run the benchmark (with UI, checkpointing and Tensorboard logging turned off):
sudo nice -n -20 ./build/src/training_benchmark
Train a policy, e.g. using the Docker image with the UI:
docker run -it --rm -p 8000:8000 -v $(pwd)/checkpoints:/learning_to_fly/checkpoints arpllab/learning_to_fly
The checkpoints are placed in the current working directory's checkpoints
folder. Inspect the logs of the container to find the path of the final log, e.g., checkpoints/multirotor_td3/2023_11_16_14_46_38_d+o+a+r+h+c+f+w+e+_002/actor_000000000300000.h
.
We can mount this file into the container arpllab/learning_to_fly_build_firmware
for building the firmware, e.g.:
docker run -it --rm -v $(pwd)/checkpoints/multirotor_td3/2023_11_16_14_46_38_d+o+a+r+h+c+f+w+e+_002/actor_000000000300000.h:/controller/data/actor.h:ro -v $(pwd)/build_firmware:/output arpllab/learning_to_fly_build_firmware
This should build the firmware using the newly trained policy and output the binary to build_firmware/cf2.bin
. After that we can use the cfclient
package to flash the firmware (find the installation instructions here)
cfloader flash build_firmware/cf2.bin stm32-fw -w radio://0/80/2M
Note: Running the learned controller is at your own risk!
Once the firmware is flashed, use the cfclient UI to connect to the Crazyflie. To fly you should have some system for position estimation. In our tests using the Crazyflie 2.1 and the Optical Flow Deck v2 10/10 seeds yielded policies that could successfully take-off and hover. We recommend using a game-controller to use the "hover button" as a Dead man's switch. The way our controller is configured by default, it reacts to the "hover button" as a signal to be activated. By default, it will shortly idle the motors and then pass through the RPM outputs of the policy. The default setpoint is 0.3m above the position of the quadrotor where the hover-button was pressed. Also make sure that the yaw angle is low before taking off because we noticed some agressive maneuvres when the yaw angle is too high which in combination with the ground effect might lead to a crash.
If you find a policy to take-off and hover well, you can change the parameter rlt.wn
to 4
which puts the controller into figure-eight tracking mode. The scale and interval ([s]) can be adjusted by the parameters rlt.fes
and rlt.fei
respectively. The height at which the figure-eight is executed is set by rlt.target_z_fe
.
If you don't have a gamepad available and for better trajectory tracking (taking off first -> then tracking) you can use the scripts/trigger.py
script. It has three modes hover_original (just for testing)
, takeoff_and_switch
, hover_learned
, and trajectory_tracking
. In all modes the enter
key is used as the dead man's switch so that if anything goes wrong you can just release it and the motors should turn off.
Since taking-off has a big sim2real gap (ground-effect, z-ranger/optical-flow estimation behavior) and a large setpoint error, the most conservative option (and recommended first step) is to take-off with the original controller and then switch to the learned policy mid-air. This is accomplished by the takeoff_and_switch
mode which takes off with the original controller and switches to the learned policy after three seconds. The height can be specified with --height
.
python3 scripts/trigger.py --mode takeoff_and_switch
If the hovering is stable you can also try taking off using the policy. The hover_learned
mode uses the trained policy to take-off and hover (the height can be specified with --height
):
python3 scripts/trigger.py --mode hover_learned
trajectory_tracking
takes off using the original controller first and after a timeout (3s
by default) switches to the learned policy to do the trajectory tracking. The interval (5.5s
by default) and the scale (1x
by default, note the figure eight is 2mx1m
by default) can be adjusted using the parameters --trajectory-scale
and --trajectory-interval
:
python3 scripts/trigger.py --mode trajectory_tracking --trajectory-scale 0.3
Note: If you moved the Crazyflie it is best to restart it before flying (in the same position and orientation as for the intended take-off) because the policy targets to regulate the global yaw to 0
. If there is a large offset (e.g. >90 degree
) this might trigger a strong response.
When using our code/setup in an academic work please cite our publication using the following Bibtex citation:
@ARTICLE{eschmann2024learning,
author={Eschmann, Jonas and Albani, Dario and Loianno, Giuseppe},
journal={IEEE Robotics and Automation Letters},
title={Learning to Fly in Seconds},
year={2024},
volume={},
number={},
pages={1-8},
keywords={Training;Quadrotors;Taxonomy;Complexity theory;Task analysis;Rotors;Motors;Aerial Systems: Applications;Machine Learning for Robot Control;Reinforcement Learning},
doi={10.1109/LRA.2024.3396025}
}