/DPT-VO

Dense Prediction Transformer for scale estimation in monocular visual odometry

Primary LanguagePythonMIT LicenseMIT

DPT-VO: Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry

arXiv License: MIT

Official repository of the paper "Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry"

Abstract

Monocular visual odometry consists of the estimation of the position of an agent through images of a single camera, and it is applied in autonomous vehicles, medical robots, and augmented reality. However, monocular systems suffer from the scale ambiguity problem due to the lack of depth information in 2D frames. This paper contributes by showing an application of the dense prediction transformer model for scale estimation in monocular visual odometry systems. Experimental results show that the scale drift problem of monocular systems can be reduced through the accurate estimation of the depth map by this model, achieving competitive state-of-the-art performance on a visual odometry benchmark.

Contents

  1. Dataset
  2. Download the DPT Model
  3. Setup
  4. Usage
  5. Evaluation

1. Dataset

Download the KITTI odometry dataset (grayscale).

In this work, we use the .jpg format. You can convert the dataset to .jpg format with png_to_jpg.py.

Create a simbolic link (Windows) or a softlink (Linux) to the dataset in the dataset folder:

  • On Windows: mklink /D <path_to_your_project>\DPT-VO\dataset <path_to_your_downloaded_dataset>
  • On Linux: ln -s <path_to_your_downloaded_dataset> <path_to_your_project>/DPT-VO/dataset

Then, the data structure should be as follows:

|---DPT-VO
    |---dataset
        |---sequences_jpg
            |---00
                |---image_0
                    |---000000.png
                    |---000001.png
                    |---...
                |---image_1
                    |...
                |---image_2
                    |---...
                |---image_3
                    |---...
            |---01
            |---...

2. Download the DPT Model

Download the DPT trained weights and save it in the weights folder.

For more details please check the original DPT repository.

3. Setup

  • Create a virtual environment using Anaconda and activate it:
conda create -n dpt-vo python==3.8.0
conda activate dpt-vo
  • Install dependencies (with environment activated):
pip install -r requirements.txt

4. Usage

Run the main.py code with the following command:

python main.py  -s <sequence_number>

You can also use a different path to dataset by changing the arguments --data_path and --pose_path:

python main.py -d <path_to_dataset> -p <path_to_gt_poses> -s <sequence_number>

5. Evaluation

The evalutaion is done with the KITTI odometry evaluation toolbox. Please go to the evaluation repository to see more details about the evaluation metrics and how to run the toolbox.

Citation

Please cite our paper if you find this research useful in your work:

@INPROCEEDINGS{Francani2022,
    title={Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry},
    author={André O. Françani and Marcos R. O. A. Maximo},
    booktitle={2022 Latin American Robotics Symposium (LARS), 2022 Brazilian Symposium on Robotics (SBR), and 2022 Workshop on Robotics in Education (WRE)},
    days={18-21},
    month={oct},
    year={2022},
}

References

Some of the functions were borrowed and adapted from three amazing works, which are: DPT, DF-VO, and monoVO.