How to choose the best training model?
Closed this issue · 8 comments
Hello,I noticed you mentioned:
Q4. Testing Epoch Matters.
By default, our model will train 16 epochs. But how to select the best training model for testing to achieve the best performance? One solution is to use PyTorch-lightning. For simplicity, you can decide which checkpoint to use based on the .log file we provide.
I wanna know how you use PyTorch-lighting to choose the best training model?When check log file or train curve,Which metric you think is more important?I've conducted tests on all 16 models, and the results (overall/comp/acc) is displayed in the figure below:
And the training curves for the losses are also shown below:
- loss curve is a little wired(I think it's caused by dds_loss)
- thress2mm(/4/8) is decreasing all the time
- abs_loss(similar to epe) curve doesn't seem to have a strict correlation with the overall result curve, as illustrated below:
Based on my observations, it appears that the best results are achieved around epoch 7, and further training from epoch 8 to 15 doesn't seem to significantly improve the outcome.
This makes me fell confused, how many epoches should we train and how to choose the best one? Could you provide insights or recommendations on this?
Welcome to our old friend again.👋
Thank you for conducting extensive replication experiments on our work, which I believe will be very helpful to other researchers.
Focusing on your question, we also noticed similar issues in our early experiments. In some cases, MVS networks only need to train for a few epochs to achieve amazing results, and subsequent training seems meaningless. But if you take all the DTU training checkpoints and test them on the Tanks and Temples dataset, I think the conclusion will be completely different. Many of the questions you consulted included the previous questions, DTU itself has to take a lot of blame. You have to understand that the acquisition of DTU GT itself is a kind of measurement (similar to how you measure a pencil with a ruler in your child) , so there must be some error.
So now the conclusion is obvious => it is only meaningful to discuss the corresponding losses under some error => thres2mm/4mm/8mm.
I think you already have the answer to the rest of your questions.
@xingchen2022 Recently, I have been preparing for the course of MVS series (in Chinese). I think it is very suitable for you.
But if you take all the DTU training checkpoints and test them on the Tanks and Temples dataset, I think the conclusion will be completely different.
Oh, I didn't consider this before, you really make a good point here! But I'm still a little confused about "it is only meaningful to discuss the corresponding losses under some error => thres2mm/4mm/8mm." Does this imply that comparing 'abs_loss' is only valuable when we limit the discussion to scenarios with similar 'thres2mm/4mm/8mm' values?
Btw, I've also conducted a comparison between training models with and without dds_loss, and the results are depicted in the figure below:
As you previously mentioned, the curve of dds_loss appears unusual, but it seems to enhance the final result. I've also compared the training curves of these models, as shown in the following figures (orange represents 'ori,' and sky-blue represents 'ori_wo_dds'):
In this comparison, '2/4/8mm' threshold losses for 'ori' and 'ori_wo_dds' appear to be similar, while 'abs_loss' is notably smaller in 'ori_wo_dds.' However, this reduction in 'abs_loss' doesn't seem to result in a significant improvement in the overall loss, in contrast 'ori' has less overall loss. My understanding is '2/4/8mm' losses, which might represent the coverage of accurate depth pixels, are more critical than 'abs_loss,' which represents the absolute depth average of all pixels, to some extent? What do you think of this?
Regarding your point about the DTU ground truth depth not being absolutely accurate, I attempted to use the ground truth depth to reconstruct the point cloud, which yielded accuracy and completeness values of approximately 0.1931 and 0.2383. This suggests that there might still be a little room for improvement if we can obtain a 'more accurate' depth map compared to the ground truth depth map. Would you agree with this perspective?
@xingchen2022 Recently, I have been preparing for the course of MVS series (in Chinese). I think it is very suitable for you.
I will buy it! Really thank you for your always patient explanation and guidance, paying for knowledge should be encouraged!
Does this imply that comparing 'abs_loss' is only valuable when we limit the discussion to scenarios with similar 'thres2mm/4mm/8mm' values?
Yes. Currently, abs_loss
only calculates pixel-wise depth difference between depth GT.
My understanding is '2/4/8mm' losses, which might represent the coverage of accurate depth pixels, are more critical than 'abs_loss,' which represents the absolute depth average of all pixels, to some extent? What do you think of this?
I can't agree more.
It can be thought through the extreme situation of another dimension. For example, my method can make the depth estimation of the edge of the subject very accurate, but because the edge pixels only account for a small proportion, it can't reflect my improvement through similar average abs_loss
.
This suggests that there might still be a little room for improvement if we can obtain a 'more accurate' depth map compared to the ground truth depth map. Would you agree with this perspective?
I have tried this before. But I don't think this proves that the depth GT will not cause wrong network supervised.
For MVS tasks, the observation of the loss indicator may be different from traditional visual tasks, especially considering that the depth map is only an intermediate host.
This suggests that there might still be a little room for improvement if we can obtain a 'more accurate' depth map compared to the ground truth depth map. Would you agree with this perspective?
I have tried this before. But I don't think this proves that the depth GT will not cause wrong network supervised.
For MVS tasks, the observation of the loss indicator may be different from traditional visual tasks, especially considering that the depth map is only an intermediate host.
I understand your point.
Using the DTU Matlab code to calculate the final score takes too much time, and the TT dataset can only be submitted once a week. I've been trying different approaches to assess whether the training model has improved. However, it might not be easy to measure the depth map and final point cloud results, which could be a limitation of MVSNet-based methods.
An end-to-end model might be a better solution, but achieving comparable results with it could be more challenging :(
Anyway, thank you all the time! ❤