/MonoNav

MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction

Primary LanguagePythonMIT LicenseMIT

MonoNav: MAV Navigation via Monocular
Depth Estimation and Reconstruction

Authors: Nathaniel Simon and Anirudha Majumdar

Intelligent Robot Motion Lab, Princeton University

Project Page | Paper (arXiv) | Video |


MonoNav is a monocular navigation stack that uses RGB images and camera poses to generate a 3D reconstruction, enabling the use of conventional planning techniques. MonoNav leverages pre-trained depth-estimation (ZoeDepth) and off-the-shelf fusion (Open3D) to generate a real-time 3D reconstruction of the environment. At each planning step, MonoNav selects from a library of motion primitives to navigate collision-free towards the goal. While the robot executes each motion primitive, new images and poses are integrated into the reconstruction. In our paper, we demonstrate MonoNav on a 37 gram micro aerial vehicle (MAV) navigating hallways at 0.5 m/s (see project page and video).

Overview

This repository contains code to run the following:
MonoNav pipeline (mononav_cf.py): Monocular hallway navigation using a Crazyflie + FPV camera, as seen in our paper. We encourage you to adapt this script to other vehicles / scenes!

We also offer scripts that break MonoNav into sub-parts, which can be run independently & offline:

  1. Data collection pipeline (collect_dataset.py): Collect images and poses from your own camera / robot.
  2. Depth estimation pipeline (estimate_depth.py): Estimate depths from RGB images using ZoeDepth.
  3. Fusion pipeline (fuse_depth.py): Fuse the estimated depth images and camera poses into a 3D reconstruction.
  4. Simulate MonoNav (simulate.py): Step through the 3D reconstruction and visualize the motion primitives chosen by the MonoNav planner. This is a useful way to replay and debug MonoNav trials.

These scripts (run in sequence) form a demo, which we highly recommend before adapting MonoNav for your system. We include a sample dataset (data/demo_hallway), so no robot is needed to run the demo!

In addition, we include the following resources in the /utils/ directory:

We hope you enjoy MonoNav!

Installation and Configuration

Clone the repository and its submodules (ZoeDepth):

git clone --recurse-submodules https://github.com/natesimon/MonoNav.git

Install the dependencies from environment.yml using mamba (fastest):

mamba env create -n mononav --file environment.yml
mamba activate mononav

or conda :

conda env create -n mononav --file environment.yml
conda activate mononav

Note: If the installation gets stuck at Solving environment: ..., we recommend updating your system, re-installing conda / miniconda, or using mamba.

Tested on: (release / driver / GPU)

  • Ubuntu 22.04 / NVIDIA 535 / RTX 4090
  • Ubuntu 22.04 / NVIDIA 535 / Titan RTX
  • Ubuntu 20.04 / NVIDIA 530 / Titan Xp
  • Ubuntu 18.04 / NVIDIA 470 / Titan Xp

If you do not have access to GPU, set device = "CPU:0" in config.yml. This will reduce the speed of both depth estimation and fusion, and may not be fast enough for real-time operations.

Demo: Out of the Box!

We include a demo dataset (data/demo_hallway) to try MonoNav out of the box - no robot needed! From a series of (occasionally noisy) images and poses, we will transform the images, estimate depth, and fuse them into a 3D reconstuction.

  1. The dataset includes just RGB images and camera poses from a Crazyflie (see Hardware):

    ├── <demo_hallway>
    │   ├── <crazyflie_poses> # camera poses
    │   ├── <crazyflie_rgb_images> # raw camera images
    
  2. Set the dataset path in config.yml. By default, data_dir: 'data/demo_hallway, but make sure to change this if you want to process your own dataset.

    data_dir: 'data/demo_hallway' # change to whichever directory you want to process
    
  3. To demonstrate ZoeDepth, run python estimate_depth.py. This reads in the crazyflie images and transforms them to match the camera intrinsics used in the ZoeDepth training dataset. This is crucial for depth estimation accuracy (see Calibration for more details). The transformed images are saved in <kinect_rgb_images> and used to estimate depth. The estimated depths are saved as numpy arrays and colormaps (for visualization) in <kinect_depth_images>. After running, take a look at the resulting images and note the loss of peripheral information as the raw images are undistorted.

  4. To demonstrate fusion, run: python fuse_depth.py. This script reads in the (transformed) images, poses, and depths, and integrates them using Open3D's TSDF Fusion. After completion, a reconstruction should be displayed with coordinate frames to mark the camera poses throughout the run. The reconstruction is saved to file as a VoxelBlockGrid (vbg.npz) and pointcloud (pointcloud.ply - which can be opened using MeshLab).

  5. Next, run python simulate.py. This loads the reconstruction (vbg.npz) and executes the MonoNav planner. The planner is executed at each of the camera poses, and does the following:

    1. visualizes (in black) the available motion primitives in the trajectory library (utils/trajlib),
    2. chooses a motion primitive according to the planner: choose_primitive() in utils/utils.py selects the primitive that makes the most progress towards goal_position while remaining min_dist2obs from all obstacles in the reconstruction,
    3. paints the chosen primitive green. simulate.py is useful for debugging and de-briefing, and also to anticipate how changes in the trajectory library or planner affect performance. For example, by changing min_dist2obs inconfig.yml, it is possible to see how increasing/decreasing the distance threshold to obstacles affects planner performance.
  6. Finally, try changing the motion primitives to see how they affect planner performance! To modify and generate the trajectory library, open utils/generate_primitives.py. Try changing num_trajectories from 7 to 11, and run generate_primitives.py. This will display the new motion primitives and update the trajectory library. Note that each motion primitive is defined by a set of gentle turns left, right, or straight. An "extension" segment is added to the primitive (but not flown) to encourage foresight in the planner. See our paper for more details. Feel free to re-run simulate.py to try out the new primitives!

The tutorial should result in the additional files added to data/demo_hallway:

├── <demo_hallway>
│   ├── <kinect_rgb_images> # images transformed to match kinect intrinsics
│   ├── <kinect_depth_images> # estimated depth (.npy for fusion and .jpg for visualization)
│   ├── vbg.npz / pointcloud.ply # reconstructions generated by fuse_depth.py

If you are unable to execute the full tutorial but want to reference the output, you can download it from Google Drive. If you have made it through the tutorials, you can try MonoNav on your own dataset!

Collect your own Dataset

To run MonoNav on your own dataset, there are two crucial steps:

  1. Collect your own dataset (RGB images and poses). We provide collect_dataset.py which works for the Crazyflie, but you may have to modify it for your system. Ensure that you are transforming the pose (rotation + translation) into the Open3D frame correctly, and saving it in homogeneous form. See get_crazyflie_pose in utils/utils.py for reference. Make sure to update data_dir in config.yml to point to your collected dataset.
  2. Provide the camera intrinsics. This is crucial for the image transformation step and can affect depth estimation quality dramatically. We recommend that you follow our provided calibration sequence to automatically generate a .json of camera intrinsics and distortion coefficients. config.yml should then be updated to point to the intrinsics json file path: camera_calibration_path: 'utils/calibration/intrinsics.json'.

With those steps complete, you can run estimate_depth.py, fuse_depth.py, and simulate.py to reconstruct and try MonoNav on your own dataset!

Camera Calibration

A key aspect to MonoNav is using a pre-trained depth estimation network on a different camera than the one used during training. The micro FPV camera (Wolfwhoop WT05) that we use has significant barrel distortion (fish-eye), and thus the images must be first undistorted to better match the camera intrinsics used to collect the training data. To maintain the metric depth estimation accuracy of the model, we must transform the input image to match the intrinsics of the training dataset. ZoeDepth is trained on NYU-Depth-v2, which used the Microsoft Kinect.

The transform_image() function in utils/utils.py performs the transformation: resizing the image and undistorting it to match the Kinect's intrinsics.

In utils/calibration, we provide scripts to generate intrinsics.json for your own camera. Steps to calibrate:

  1. Make a chessboard calibration target.
  2. take_pictures.py: Take many pictures (recommended: 80+) of the chessboard by pressing the spacebar. Saves them to utils/calibration/calibration_pictures/.
  3. calibrate.py: Based on the OpenCV sample.You need to provide several arguments, including the structure and dimensions of your chessboard target. Example:
    MonoNav/utils/calibration$ python calibrate.py -w 6 -h 8 -t chessboard --square_size=35 ./calibration_pictures/frame*.jpg
    
    
    The intrinsics are printed and saved to utils/calibration/intrinsics.json.
  4. transform.py: This script loads the intrinsics from intrinsics.json and transforms your calibration_pictures to the Kinect's dimensions (640x480) and intrinsics. This operation may involve resizing your image. The transformed images are saved in utils/calibration/transform_output and should be inspected.
  5. Finally, we recommend re-running calibration on transform_output to ensure that the intrinsics match the Kinect.
    MonoNav/utils/calibration$ python calibrate.py -w 6 -h 8 -t chessboard --square_size=35 ./transform_output/frame*.jpg
    
    
    transform.py will save the intrinsics of the transformed images to check_intrinsics.json, which should roughly match those of the Kinect:
    [[525.    0.  319.5]
    [  0.  525.  239.5]
    [  0.    0.    1. ]]
    

MonoNav: Hardware

To run MonoNav as shown in our paper, you need a monocular robot with pose (position & orientation) estimation. We used the Crazyflie 2.1 micro aerial vehicle modified with an FPV camera. Our hardware setup follows closely the one used in Princeton's Introduction to Robotics course. If you are using the Crazyflie, we recommend that you follow the Bitcraze tutorials to ensure that the vehicle flies and commmunicates properly.

List of parts:

Running mononav_cf.py

The mononav_cf.py ("cf" for "crazyflie") script performs the image transformation, depth estimation, fusion, and planning process simultaneously for goal-directed obstacle avoidance. If FLY_CRAZYFLIE: True, the Crazyflie will takeoff, if False, the pipeline will execute without the Crazyflie starting its motors (useful for testing).

After takeoff, the Crazyflie can be controlled manually by the following key commands:

w: choose MIDDLE primitive (typically FORWARD)
a: choose FIRST primitive (typically LEFT)
d: choose LAST primitive (typically RIGHT)
c: end control (stop and land)
q: end control immediately (EMERGENCY stop and land)
g: start MonoNav

Manual control is an excellent way to check that the pipeline is working, as it should produce a sensible reconstruction after landing. As mentioned in the paper, it is HIGHLY RECOMMENDED to manually fly forward 3x (press w, w, w) before starting MonoNav (press g). This is due to the narrow field of view of the transformed images, which discards peripheral information; to make an informed decision, the planner needs information collected 3 primitives ago.

After MonoNav is started, the Crazyflie will choose and execute primitives according to the planner. If collision seems imminent, you can manually choose a primitive (by w, a, d) or stop the planner (c or q). During the run, a crazyflie_trajectories.csv log is produced, which includes the frame_number and time_elapsed during replanning, as well as the chosen_traj_idx.

Future Work

MonoNav is a "work in progress" - there are many exciting directions for future work! If you find any bugs or implement any exciting features, please submit a pull request as we'd love to continue improving the system. Once you get MonoNav running on your robot, send us a video! We'd love to see it.

Areas of future work in the pipeline:

  • Run MonoNav on new platforms and in new settings!
  • Integrate with ROS for easier interoperability on new systems.
  • Integrate improved depth estimation pipelines, which might take in multiple frames or metadata to improve the depth estimate.
  • Improve the planner to treat space as unsafe UNTIL explored, which could prevent crashes into occluded obstacles. (The current planner treats unseen space as free.)
  • Improve the motion primitive control from open-loop to closed-loop, for more reliable and accurate primitive tracking.

Citation

@inproceedings{simon2023mononav,
  author    = {Nathaniel Simon and Anirudha Majumdar},
  title     = {{MonoNav: MAV Navigation via Monocular Depth Estimation and Reconstruction}},
  booktitle = {Symposium on Experimental Robotics (ISER)},
  year      = {2023},
  url       = {https://arxiv.org/abs/2311.14100}
}