NVlabs/Deep_Object_Pose

Issue with running inference on shiny meat can dataset

Opened this issue · 9 comments

Hello! I'm trying to run the ROS-independent inference with the DOPE network on the dataset with the shiny meat can that you provide here, on a Docker image using the Dockerfile you include. I’m using these weights and I’ve downloaded the cm aligned 3D model (010_potted_meat_can) using the NVDU toolkit as you suggest. The issue is that the produced results are rather poor (see below) and even though this particular object and the respective scenes are challenging, I don’t think they can be justified given the results you report in your publication. I’ve spent several days reading the documentation, the GitHub issues and your publications to figure out what I’m doing wrong. Would it be possible to give me some directions? I’m trying to recreate your results because I’ve re-trained the model on the meat can and I would like to compare the performance. Also, I suspect that one of the reasons for the poor results is that the YCB model I’m using does not have the corrected texture you report in the NViSII paper. If that’s the case, would it be possible to provide the 3D model and/or the correct texture?

In this link you can also find a few indicative visualizations of the results, and the configuration files used for inference. The forked code with my modifications is updated here

Quantitative results from running inference on scene 000:

mean 413329.88733643986 std 24477194.25580374 ratio 3529/6000
auc at  0.02 : 0.0
auc at  0.04 : 0.0
auc at  0.06 : 0.004333333333333333
auc 0.03403944444444444

output

Quantitative results from running inference on scenes 000+ 001+ 002:

    	"mean": 166931.02042649745,
    	"std": 14544135.92708221,
    	"ratio": "10331/18000",
    	"auc_val_thres": [
        	{
            	"0.02": 0.0
        	},
        	{
            	"0.04": 5.555555555555556e-05
        	},
        	{
            	"0.06": 0.0034444444444444444
        	}
    	],
    	"auc": 0.033453796296296294

output

Ahhh yeah I used different 3d models, I am sorry. I think I have them on my computer at home. I will update you when I get home, I might forget before of CVPR, so please ping me next week if I do so.

@TontonTremblay thank you for the prompt response! Were you perhaps able to find the correct 3d models?

@TontonTremblay Thank you! Unfortunately even using the 3D model with the correct texture did not improve the results. I have not changed anything else compared to my initial comment. Would it be possible to offer your insight on the matter?

The 3d models I trained on are different from ycb original space, and different from the BOP ones. Our transforms are defined here. https://github.com/NVIDIA/Dataset_Utilities. I am sorry this is a mess, and it has been for a little while (since everyone keep changing the models, probably retraining on bop would be a good idea, but I do not have access to the compute to do it again.

@TontonTremblay to be honest I am a little confused....I was under the impression that the weights you provide (these ones) are produced from training on the meat can shiny dataset. Which/where are the correct weights for this specific dataset and what are the correct values for the DOPE inference configuration (thresh_angle, thresh_map, sigma, thresh_points) using these weights? Also, upon thinking the issue further, the 3D model texture shouldn't matter for inference, since from what I understand only the dimensions of the object's/model's cuboid are used for the inference, and those are consistent across all 3d models I tried. Again, thank you for your time and sorry to be persistent.

regardless of the 3d model texture, there are 3 different sources of reference frames for the 3d model.

  1. ycb dataset original
  2. ycb-video
  3. dope
  4. bop dataset
    So if you train with the reference frame with dope and test on the annotation with bop for example and you do not take into consideration the transforms from dope to bop you will have a constant difference in the error when testing. As a test take the gt using my model I shared above and the model from bop and you will see for youself.

@TontonTremblay I see! so, which reference frame should I use given the annotations of the shiny meat can dataset? Note that I'm evaluating the inference results using scripts/metrics/add_compute.py.

I would start by debugging things a little bit, pick a 3d model and try to render it at the right location. You will see the differences between the models. Then you can find the transform from reference to the other. But yeah do baby steps, sorry for not have make these steps easier.