How to reproduce the result on DDAD

Question

How to reproduce the result on DDAD

Opened this issue 4 years ago · 7 comments

Hi,
Thank you for releasing the code. I am trying to train the packet on DDAD. But I can not reproduce the result so far. I use 8 v100 gpus. The training command is 'CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 horovodrun -np 8 -H localhost:8 python scripts/train.py ./configs/train_ddad.yaml' . The details of my config are as follows:
model:
name: 'SelfSupModel'
optimizer:
name: 'Adam'
depth:
lr: 0.00009
pose:
lr: 0.00009
scheduler:
name: 'StepLR'
step_size: 30
gamma: 0.5
depth_net:
name: 'PackNet01'
version: '1A'
pose_net:
name: 'PoseNet'
version: ''
params:
crop: ''
min_depth: 0.0
max_depth: 200.0
datasets:
augmentation:
image_shape: (384, 640)
train:
batch_size: 8
num_workers: 8
dataset: ['DGP']
path: ['/data/ddad_train_val/ddad.json']
split: ['train']
depth_type: ['lidar']
cameras: [['camera_01']]
repeat: [5]
validation:
num_workers: 8
dataset: ['DGP']
path: ['/data/ddad_train_val/ddad.json']
split: ['val']
depth_type: ['lidar']
cameras: [['camera_01']]
test:
num_workers: 8
dataset: ['DGP']
path: ['/data/ddad_train_val/ddad.json']
split: ['val']
depth_type: ['lidar']
cameras: [['camera_01']]
checkpoint:
filepath: './data/experiments'
monitor: 'abs_rel_pp_gt'
monitor_index: 0
mode: 'min'

[0]:| �[2m�[1m�[32mE: 50 BS: 8 - SelfSupModel LR (Adam): Depth 4.50e-05 Pose 4.50e-05�[0m |
[0]:||
[0]:| METRIC | abs_rel | sqr_rel | rmse | rmse_log | a1 | a2 | a3 |
[0]:||
[0]:| �[1m�[35m*** /data/ddad_train_val/ddad.json/val (camera_01) �[0m |
[0]:|*********************************************************************************************|
[0]:| �[36mDEPTH | 0.853 | 23.485 | 37.371 | 2.022 | 0.002 | 0.005 | 0.008 �[0m |
[0]:| �[36mDEPTH_PP | 0.853 | 23.542 | 37.468 | 2.025 | 0.002 | 0.004 | 0.008 �[0m |
[0]:| �[36mDEPTH_GT | 0.268 | 12.451 | 19.267 | 0.333 | 0.705 | 0.869 | 0.936 �[0m |
[0]:| �[36mDEPTH_PP_GT | 0.257 | 11.199 | 18.532 | 0.324 | 0.709 | 0.873 | 0.939 �[0m |

Are there any problems? Thank you for your attention.

Answer 1 · 2021-05-21T15:40:38.000Z

Hmm, can you try a few things:

Start from a pre-trained model (e.g. a KITTI model) to see if it diverges
Try another network (DepthResNet or PoseResNet)
Play around with the learning rate

By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge!
https://eval.ai/web/challenges/challenge-page/902/overview

Answer 2 · 2021-05-21T16:23:15.000Z

Hmm, can you try a few things:
* Start from a pre-trained model (e.g. a KITTI model) to see if it diverges

* Try another network (DepthResNet or PoseResNet)

* Play around with the learning rate
By the way, once you get some numbers you can try submitting to our EvalAI DDAD challenge!
https://eval.ai/web/challenges/challenge-page/902/overview

Do you use any pre-trained weights to get the result 0.173(abs_rel) on DDAD and 0.111(abs_rel) on KITTI? Or just train from scratch?

Answer 3 · 2021-05-26T15:48:57.000Z

No, those are trained from scratch with PackNet. I just mentioned pre-trained weights as a way to see if there is anything wrong with the training setup that you are using.

Answer 4 · 2021-06-01T11:14:51.000Z

Hi, Thanks for your work.
Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

Answer 5 · 2021-06-01T16:49:19.000Z

@a1600012888 Yes, that configuration file should work.

Answer 6 · 2021-06-02T16:03:03.000Z

@a1600012888 Yes, that configuration file should work.

Thanks!

Answer 7 · 2021-06-03T14:11:42.000Z

Hi, Thanks for your work.
Was the results on DDAD produced by training from scratch using the config setup provided here? https://github.com/TRI-ML/packnet-sfm/blob/master/configs/train_ddad.yaml

Hi, for DDAD experiments,
Did you train the model using 8 gpu cards with this config file?
If so, does that means the effective batch size is 8*2=16, and learning rate is 9e-5?