YvanYin/Metric3D

Dense depth map output

Deng-King opened this issue · 1 comments

Hi there, thank you for your contributions.

When I ran the demo code following the readme.md tutorial (three images in the folder ./data/kitti_demo/ of this repository) but I only got a relatively sparse depth prediction, how do I get dense depth estimates like it shows in the project page?

What I just got:
20240926-152444
0000000005_merge

What I'm expecting:
image

I got a good result by running test_vit.sh. I finally figured out that the second row of the visualization output is the Depth Map output, and the third row is GT. Maybe they were captured by Lidar, so it looks relatively sparse.

20240926-163915
0000000005_merge2

But now I cannot seem to get a depth map by using known camera intrinsics (like running test_kitti.sh and test_nyu.sh). I added the test code below line 263 in de_test.py

    ...

    pred_depths, outputs = get_prediction(
        model=model,
        input=torch.stack(rgb_inputs),  # Stack inputs for batch processing
        cam_model=None,
        pad_info=pads,
        scale_info=None,
        gt_depth=None,
        normalize_scale=None,
    )
    print(' -- pred --') # line 263
    print(pred_depths.shape)
    print(pred_depths.max())
    print(pred_depths.mean())
    print(pred_depths.max())
    
    for j, gt_depth in enumerate(gt_depths):
        normal_out = None

    ...

and the cmd output is:

[09/26 16:25:10 root]: Distributed training: False
[09/26 16:25:15 root]: Loading weight '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
[09/26 16:25:15 root]: Loading weight '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
[09/26 16:25:16 root]: Successfully loaded weight: '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
[09/26 16:25:16 root]: Successfully loaded weight: '/media/deng/Data/Metric3D/weight/metric_depth_vit_large_800k.pth'
  0%|                                                                                                                                                                                        | 0/3 [00:00<?, ?it/s]data/nyu_demo/rgb/rgb_00000.jpg
 -- pred --
torch.Size([1, 1, 480, 1216])
tensor(24.2716, device='cuda:0')
tensor(24.2192, device='cuda:0')
tensor(24.2716, device='cuda:0')
/media/deng/Data/Metric3D/mono/utils/do_test.py:322: RankWarning: Polyfit may be poorly conditioned
  pred_global, _ = align_scale_shift(pred_depth, gt_depth)
 33%|██████████████████████████████████████████████████████████▋                                                                                                                     | 1/3 [00:01<00:02,  1.31s/it]data/nyu_demo/rgb/rgb_00050.jpg
 -- pred --
torch.Size([1, 1, 480, 1216])
tensor(24.2716, device='cuda:0')
tensor(24.2192, device='cuda:0')
tensor(24.2716, device='cuda:0')
/media/deng/Data/Metric3D/mono/utils/do_test.py:322: RankWarning: Polyfit may be poorly conditioned
  pred_global, _ = align_scale_shift(pred_depth, gt_depth)
 67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                          | 2/3 [00:01<00:00,  1.62it/s]data/nyu_demo/rgb/rgb_00100.jpg
 -- pred --
torch.Size([1, 1, 480, 1216])
tensor(24.2716, device='cuda:0')
tensor(24.2196, device='cuda:0')
tensor(24.2716, device='cuda:0')
/media/deng/Data/Metric3D/mono/utils/do_test.py:322: RankWarning: Polyfit may be poorly conditioned
  pred_global, _ = align_scale_shift(pred_depth, gt_depth)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.90it/s]
missing gt_depth, only save visualizations...

which means the model gives a wrong prediction but IDK why 😭

and the visual is still the same

image