/MonoDepth-to-ManyDepth

Self-Supervised Depth Estimation on Monocular Sequences

Primary LanguagePython

MonoDepth to ManyDepth: Self-Supervised Depth Estimation on Monocular Sequences

merge-athens Trevi-merge

  1. Dataset

    • Dense Depth for Autonomous Driving (DDAD)
    • KITTI Eigen Split
    wget -i splits/kitti_archieves_to_download.txt -P kitti_data/
    cd kitti_data/
    unzip "*.zip"
    cd ..
    find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2, 1x1, 1x1 {.}.png {.}.jpg && rm {}'
    
    • The above conversion command creates images with default chroma subsampling 2x2, 1x1, 1x1.
  2. Problem Setting

    while specialist hardware can give per-pixel depth, a more attractive approach is to only require a single RGB camera.

    train a deep network to map from an input image to a depth map

    image

    image

  3. Methods

    • Geometry Models

      The simplest representation of a camera an image plane at a given position and orientation in space.

      image

      The pinhole camera geometry models the camera with two sub-parameterizations, intrinsic and extrinsic paramters. Intrinsic parameters model the optic component (without distortion), and extrinsic model the camera position and orientation in space. This projection of the camera is described as:

      image

      A 3D point is projected in a image with the following formula (homogeneous coordinates):

      image

    • Cross-View Reconstruction

    frames the learning problem as one of novel view-synthesis, by training a network to predict the appearance of a target image from the viewpoint another image using depth (disparity)

    formulate the problem as the minimization of a photometric reprojection error at training time

    image

    image

    image

    Here. pe is a photometric reconstruction error, proj() are the resulting 2D coordinates of the projected depths Dₜ in the source view and <> is the sampling operator. For simplicity of notation we assume the pre-comuted intrinsics K of all views are identical, though they can be different. α is set to 0.85.

    image

    consider the scene structure and camera motion at the same time, where camera pose estimation has a positive impact on monocular depth estimation. these two sub-networks are trained jointly, and the entire model is constrained by image reconstruction loss similar to stereo matching methods. formulate the problem as the minimization of a photometric reprojection error at training time formulate the problem as the minimization of a photometric reprojection error at training time

  4. Folder `` dataset/ 2011_09_26/ ... ... model_dataloader/ model_layer/ model_loss/ model_save/ model_test.py model_train.py model_parser.py model_utility.py


4. Packages

apt-get update -y apt-get install moreutils or apt-get install -y moreutils


5. Training

python model_train.py --pose_type separate --datatype kitti_eigen_zhou python model_train.py --pose_type separate --datatype kitti_benchmark


6. Test

python model_test.py


7. evaluation

kitti_eigen_zhou abs_rel sqrt_rel rmse rmse_log a1 a2 a3 0.125 0.977 4.992 0.202 0.861 0.955 0.980

kitti_eigen_benchmark abs_rel sqrt_rel rmse rmse_log a1 a2 a3 0.104 0.809 4.502 0.182 0.900 0.963 0.981