j96w/DenseFusion

Cannot replicate the published results

LennardBo opened this issue · 7 comments

I have adapted DenseFusion for my work, but when I tried to replicate the original results using the given weights, the original code and evaluation, the results are quite a bit worse than the published ones (see below). The only differences I did was fixing the issue mentioned in #91 and printing the AUC and ADD<2cm in the Matlab file together.
I used the following python libraries as instructed:
python 3.6.12, pytorch 0.4.1 and cuda90 1.0 (Cuda 9.0, see this link )

I also implemented my own version of the evaluation which first goes through the objects, then the target frames and directly calculates the ADD(-s) in Python and then do postprocessing in a Jupyter notebook. I got the same results as the original evaluation which makes me believe my evaluation is sound and the network simply performs worse for most classes (although better in class 16 and 20). Did anyone experience the same?

Here i = 3 is referring to the index corresponding to the unrefined results as in evaluate_poses_keyframe.m:
class 1, i 3: AUC(s):95.31, ADD-s<2cm:100.00, AUC:70.74, ADD<2cm:70.68
class 2, i 3: AUC(s):92.44, ADD-s<2cm:98.96, AUC:86.87, ADD<2cm:88.71
class 3, i 3: AUC(s):95.05, ADD-s<2cm:100.00, AUC:90.74, ADD<2cm:96.28
class 4, i 3: AUC(s):93.74, ADD-s<2cm:96.94, AUC:84.65, ADD<2cm:82.50
class 5, i 3: AUC(s):95.81, ADD-s<2cm:100.00, AUC:90.96, ADD<2cm:94.12
class 6, i 3: AUC(s):95.73, ADD-s<2cm:100.00, AUC:79.61, ADD<2cm:58.97
class 7, i 3: AUC(s):94.38, ADD-s<2cm:99.53, AUC:89.41, ADD<2cm:93.93
class 8, i 3: AUC(s):97.20, ADD-s<2cm:100.00, AUC:95.80, ADD<2cm:100.00
class 9, i 3: AUC(s):89.30, ADD-s<2cm:92.95, AUC:79.57, ADD<2cm:76.37
class 10, i 3: AUC(s):90.12, ADD-s<2cm:89.45, AUC:76.87, ADD<2cm:59.10
class 11, i 3: AUC(s):93.61, ADD-s<2cm:100.00, AUC:86.89, ADD<2cm:86.84
class 12, i 3: AUC(s):94.40, ADD-s<2cm:99.61, AUC:87.50, ADD<2cm:86.20
class 13, i 3: AUC(s):86.10, ADD-s<2cm:60.59, AUC:5.07, ADD<2cm:0.00
class 14, i 3: AUC(s):95.25, ADD-s<2cm:100.00, AUC:83.81, ADD<2cm:79.72
class 15, i 3: AUC(s):92.14, ADD-s<2cm:95.46, AUC:83.85, ADD<2cm:82.59
class 16, i 3: AUC(s):89.50, ADD-s<2cm:98.35, AUC:32.25, ADD<2cm:0.00
class 17, i 3: AUC(s):90.21, ADD-s<2cm:98.90, AUC:77.27, ADD<2cm:49.72
class 18, i 3: AUC(s):95.08, ADD-s<2cm:100.00, AUC:89.10, ADD<2cm:91.05
class 19, i 3: AUC(s):71.52, ADD-s<2cm:77.67, AUC:24.87, ADD<2cm:0.00
class 20, i 3: AUC(s):70.17, ADD-s<2cm:72.29, AUC:25.17, ADD<2cm:11.14
class 21, i 3: AUC(s):92.21, ADD-s<2cm:100.00, AUC:50.81, ADD<2cm:0.00
class 22, i 3: AUC(s):91.19, ADD-s<2cm:94.91, AUC:74.19, ADD<2cm:68.18

j96w commented

Hi, sorry for the late reply. Could you provide more information about your evaluation process? Why there are 22 objects? I thought there are 21 objects in the YCB dataset. I remember I have tested multiple times 2 years ago before release the code and weights and all of them can reach or even higher than our published results. BTW, did you use the released PoseCNN segmentation results as the first step?

Thanks for replying! I have actually finished my work on this with these issues unsolved (as they shouldn't affect my variation too much). The 22 objects stem from the YCB Video Toolbox' plot_accuracy_keyframe.m line 24 (should similar line number after your replacements): The last "class" should be the average of all 21 objects.
I did use the release PoseCNN segmentation results and noticed that some were very bad, due to issues your work mentioned in differentiating the large and extra large clamps, but also due to poor depth perception. I assumed the latter was due to metallic surfaces and other reasons and really reduced the usable pixels drastically in some cases.

j96w commented

Just very quick check, how many refinement iterations did you use with our released refinement model? It should be a even number and did you try more iterations? Maybe 4 or 6, instead of 2?

Sorry for the late reply; I left the number of iterations untouched with 2 iterations as in the paper, but did not try to increase it as my work concentrated on modifying the estimator.

j96w commented

Hi @LennardBo , I have re-run the DenseFusion evaluation on YCB dataset and it performs very well on my side which achieves AUC 93.15% (higher than the number 93.1% we reported in our paper). Here are the AUC result plots of 21 testing objects DFr.pdf. The final mean score AUC plot is:
DFr 002

I didn't change the DenseFusion code so the refinement process is still 2 iterations as in the paper. The platform I'm running is pytorch-1.7, CUDA 11.0, GPU Nvidia RTX3070. The testing checkpoints I'm using is what I released two years ago: pose_model_26_0.012863246640872631.pth and pose_refine_model_69_0.009449292959118935.pth. And the results are outputted by YCB_Video_toolbox without any code modification. Please make sure you are correctly using this official evaluation toolbox for testing on YCB dataset.

WW-0 commented

Hi @LennardBo , I have re-run the DenseFusion evaluation on YCB dataset and it performs very well on my side which achieves AUC 93.15% (higher than the number 93.1% we reported in our paper). Here are the AUC result plots of 21 testing objects DFr.pdf. The final mean score AUC plot is:
DFr 002

I didn't change the DenseFusion code so the refinement process is still 2 iterations as in the paper. The platform I'm running is pytorch-1.7, CUDA 11.0, GPU Nvidia RTX3070. The testing checkpoints I'm using is what I released two years ago: pose_model_26_0.012863246640872631.pth and pose_refine_model_69_0.009449292959118935.pth. And the results are outputted by YCB_Video_toolbox without any code modification. Please make sure you are correctly using this official evaluation toolbox for testing on YCB dataset.

What version of PyTorch are you using with a 3070 graphics card?Are there any errors about compiling KNN?

Hi @LennardBo , I have re-run the DenseFusion evaluation on YCB dataset and it performs very well on my side which achieves AUC 93.15% (higher than the number 93.1% we reported in our paper). Here are the AUC result plots of 21 testing objects DFr.pdf. The final mean score AUC plot is: DFr 002

I didn't change the DenseFusion code so the refinement process is still 2 iterations as in the paper. The platform I'm running is pytorch-1.7, CUDA 11.0, GPU Nvidia RTX3070. The testing checkpoints I'm using is what I released two years ago: pose_model_26_0.012863246640872631.pth and pose_refine_model_69_0.009449292959118935.pth. And the results are outputted by YCB_Video_toolbox without any code modification. Please make sure you are correctly using this official evaluation toolbox for testing on YCB dataset.

I have the same question as above, Are there any errors about compiling KNN with CUDA11?