TRI-ML/packnet-sfm

How to reproduce the result on DDAD

Opened this issue · 7 comments

Hi,
Thank you for releasing the code. I am trying to train the packet on DDAD. But I can not reproduce the result so far. I use 8 v100 gpus. The training command is 'CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 horovodrun -np 8 -H localhost:8 python scripts/train.py ./configs/train_ddad.yaml' . The details of my config are as follows:
model:
name: 'SelfSupModel'
optimizer:
name: 'Adam'
depth:
lr: 0.00009
pose:
lr: 0.00009
scheduler:
name: 'StepLR'
step_size: 30
gamma: 0.5
depth_net:
name: 'PackNet01'
version: '1A'
pose_net:
name: 'PoseNet'
version: ''
params:
crop: ''
min_depth: 0.0
max_depth: 200.0
datasets:
augmentation:
image_shape: (384, 640)
train:
batch_size: 8
num_workers: 8
dataset: ['DGP']
path: ['/data/ddad_train_val/ddad.json']
split: ['train']
depth_type: ['lidar']
cameras: [['camera_01']]
repeat: [5]
validation:
num_workers: 8
dataset: ['DGP']
path: ['/data/ddad_train_val/ddad.json']
split: ['val']
depth_type: ['lidar']
cameras: [['camera_01']]
test:
num_workers: 8
dataset: ['DGP']
path: ['/data/ddad_train_val/ddad.json']
split: ['val']
depth_type: ['lidar']
cameras: [['camera_01']]
checkpoint:
filepath: './data/experiments'
monitor: 'abs_rel_pp_gt'
monitor_index: 0
mode: 'min'

[0]:| �[2m�[1m�[32mE: 50 BS: 8 - SelfSupModel LR (Adam): Depth 4.50e-05 Pose 4.50e-05�[0m |
[0]:||
[0]:| METRIC | abs_rel | sqr_rel | rmse | rmse_log | a1 | a2 | a3 |
[0]:|
|
[0]:| �[1m�[35m*** /data/ddad_train_val/ddad.json/val (camera_01) �[0m |
[0]:|*********************************************************************************************|
[0]:| �[36mDEPTH | 0.853 | 23.485 | 37.371 | 2.022 | 0.002 | 0.005 | 0.008 �[0m |
[0]:| �[36mDEPTH_PP | 0.853 | 23.542 | 37.468 | 2.025 | 0.002 | 0.004 | 0.008 �[0m |
[0]:| �[36mDEPTH_GT | 0.268 | 12.451 | 19.267 | 0.333 | 0.705 | 0.869 | 0.936 �[0m |
[0]:| �[36mDEPTH_PP_GT | 0.257 | 11.199 | 18.532 | 0.324 | 0.709 | 0.873 | 0.939 �[0m |

Are there any problems? Thank you for your attention.

Hmm, can you try a few things:

  • Start from a pre-trained model (e.g. a KITTI model) to see if it diverges
  • Try another network (DepthResNet or PoseResNet)
  • Play around with the learning rate

By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge!
https://eval.ai/web/challenges/challenge-page/902/overview

Hmm, can you try a few things:

* Start from a pre-trained model (e.g. a KITTI model) to see if it diverges

* Try another network (DepthResNet or PoseResNet)

* Play around with the learning rate

By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge!
https://eval.ai/web/challenges/challenge-page/902/overview

Do you use any pre-trained weights to get the result 0.173(abs_rel) on DDAD and 0.111(abs_rel) on KITTI? Or just train from scratch?

No, those are trained from scratch with PackNet. I just mentioned pre-trained weights as a way to see if there is anything wrong with the training setup that you are using.

Hi, Thanks for your work.
Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

@a1600012888 Yes, that configuration file should work.

@a1600012888 Yes, that configuration file should work.

Thanks!

Hi, Thanks for your work.
Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

Hi, for DDAD experiments,
Did you train the model using 8 gpu cards with this config file?
If so, does that means the effective batch size is 8*2=16, and learning rate is 9e-5?