How to reproduce evaluation metrics?

Question

How to reproduce evaluation metrics?

Opened this issue 3 years ago · 5 comments

Thank you for sharing this code. However, I have some questions about how to reproduce evaluation metrics.
(1) How many GPUs did you use?
(2) What is the batch size per GPU?
(3) Did you use refinement in MVSNET

I tried 4GPUs(batch size=1) and 4GPUs(batch size=2), the results are as below, there are some gaps to catch up with your results. (PS: I didn't use refinement.)

     Acc        Comp    Overall

4x2: 0.5636 0.5393 0.5514
4x1: 0.6434 0.6790 0.6612

I really appreciate it if you kindly provide some advice.

Answer 1 · 2021-12-10T10:08:12.000Z

Hello, @JeffWang987

You can first use the pre-trained model provided by the author to reproduce the evaluation results.
You could try to train the network using only one GPU card to see the difference.
Depth refinement is not used in this project.

Answer 2 · 2022-09-30T01:37:53.000Z

Hello, @JeffWang987 @xy-guo
Where should the code be changed to run on multiple cards?
Traceback (most recent call last):
File "/home/camellia/zyf/MVSNet_pytorch-master/eval.py", line 302, in
save_depth()
File "/home/camellia/zyf/MVSNet_pytorch-master/eval.py", line 113, in save_depth
outputs = model(sample_cuda["imgs"], sample_cuda["proj_matrices"], sample_cuda["depth_values"])
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/camellia/zyf/MVSNet_pytorch-master/models/mvsnet.py", line 123, in forward
warped_volume = homo_warping(src_fea, src_proj, ref_proj, depth_values)
File "/home/camellia/zyf/MVSNet_pytorch-master/models/module.py", line 127, in homo_warping
warped_src_fea = F.grid_sample(src_fea, grid.view(batch, num_depth * height, width, 2), mode='bilinear',
File "/home/camellia/anaconda3/envs/mvsnet-pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3836, in grid_sample
return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)
RuntimeError: CUDA out of memory. Tried to allocate 2.71 GiB (GPU 0; 11.77 GiB total capacity; 6.45 GiB already allocated; 1.92 GiB free; 8.33 GiB reserved in total by PyTorch)

Answer 3 · 2022-09-30T07:03:19.000Z

Hello @zhao-you-fei
You can use the CasMVSNet to achieve the multi-view depth estimation, which achieve more accurate depth estimation and less GPU memory consumption.
Besides, CasMVSNet supports multi GPU training.

Answer 4 · 2022-09-30T07:18:10.000Z

@XYZ-qiyh
谢谢,我知道(虽然casmvsnet这个网络也是卡在了可视化的地方),但我想知道这个mvsnet-pytorch版本的是必须单卡跑吗?我只有两个3060的卡,想直接加载预训练模型测试一下,结果报了以上错误.我想做个对比试验,目前这个问题还没有解决.请问你有什么办法吗 #3 这个问题里的回答我也都试过了,还是会出现同样的错误,我不知道到底是内存不足还是两张卡参数分配不均导致的.请原谅我英文很差只能中文交流了.再次感谢您的回答

Answer 5 · 2022-11-18T09:29:20.000Z

@JeffWang987 @XYZ-qiyh @xy-guo
Hi, have you addressed this problem?

I can reproduce the evaluation results using the pre-trained model. However, when I re-train the model myself (bs=4, a single GPU of 3090Ti, without depth refinement), the results are much worse as follows:
Acc --- Comp --- Overall
0.6212 - 0.4912 - 0.5562

Can you provide some suggestions to solve this issue?