NVlabs/Deep_Object_Pose

Incomplete cuboid detection (Real hardware detection)

Opened this issue · 21 comments

Hi, today i transfer my output training pth.file to hardware detection. My object is the red mug in YCB benchmark. However, when i set everything done to do detection, the result is very poor that can not detecte any object. And once very luckly it detecte the mug, a error occurs:

Incomplete cuboid detection.
result from detection: [None, None, None, None, None, None, None, None, (351.5094017726776, 92.23743706236147)]
Skipping.

Screenshot from 2024-10-01 10-40-03

Furthermore, i would admit my training result is also very poor. Here I show my result for a several pictures:

00016

00125

00444

01874

Here is the command that generate data by nvisii_data_gen:

$ python single_video_pybullet.py --path_single_obj model/mug/textured.obj --scale 1 --nb_frames 10000 --nb_distractors 10 --nb_objects 25

$ python -m torch.distributed.launch --nproc_per_node=1 train.py --data /home/xiangtianyi/src/Deep_Object_Pose/data_generation/nvisii_data_gen/output/output_example --object obj --batchsize 10 --lr 0.001 --epochs 100

Also I cannot assign correct object class name in the output of data generation, it only appears to be "obj".

Screenshot from 2024-10-01 10-53-33

 
Could give me any suggestions on these problems? I would be great appreciate!

ok wow yeah it is not looking too good, can you share some belief maps results?

as for the class name, I think there is a split with the "." you might need to fiddle with this in the data generating script to get the right name.

ok wow yeah it is not looking too good, can you share some belief maps results?

as for the class name, I think there is a split with the "." you might need to fiddle with this in the data generating script to get the right name.

Screenshot from 2024-10-02 09-38-02
Screenshot from 2024-10-02 09-38-09

Screenshot from 2024-10-02 09-38-16
Screenshot from 2024-10-02 09-38-27

It doesn't look well. Sad
Could you help me about it?

It looks good to me, the points it found are well around the mug, the data you have, I think the mugs are getting too big in the image. Also the mug is symmetrical when you are not seeing the handle, but it does not look too problematic. Also I thought that mug was red. is it not?

It looks good to me, the points it found are well around the mug, the data you have, I think the mugs are getting too big in the image. Also the mug is symmetrical when you are not seeing the handle, but it does not look too problematic. Also I thought that mug was red. is it not?

Thank you for your reply. But what do you mean the mug are getting too big in the image? Does it means I need to shrink the ' --scale 1" in video generation.py?Also the mug is in red color, does it have any problem needed to be fixed?

they are getting too close to the camera and they look too big. the conv receptors are not well equipped to find these wide relationships. Switching to a transformer architecture could fix that problem. (But I did not do that)

I am sure if you generate a dataset with like 2-3 mugs and change the the plane locations to be further away, your weights would work just fine, did you try on a webcam?

I am sure if you generate a dataset with like 2-3 mugs and change the the plane locations to be further away, your weights would work just fine, did you try on a webcam?

I will try to do this. I used to use realsesne DB435 camera but only use colour stream. It can detect sugar, catchup with the weights that you have trained very well. But it cannot even detect the mug in an entire cuboid. sad

I am sure if you generate a dataset with like 2-3 mugs and change the the plane locations to be further away, your weights would work just fine, did you try on a webcam?

By the way, how to change the the plane locations to be further away?

hi, sorry for the delay response. But I still have problem about detecting my object, red mug. The dope can publish nothing about the dimension about the mug. However, i find the belief map is quite good.

Screenshot from 2024-10-05 02-51-31

Here is my temporary training result. And the training is till going.

Screenshot from 2024-10-04 14-53-57
Screenshot from 2024-10-04 14-54-10
Screenshot from 2024-10-04 14-54-16
Screenshot from 2024-10-04 14-54-25

But i can really well detect the sugar, is there something important to publish data that i forget to do?
Screenshot from 2024-10-05 03-01-35

I have also changed the texture.obj to let it make more sense. The detection output is OK. Here is an output. But why i can't detect anything when move to the hardware camera?

Screenshot from 2024-10-05 05-52-47

@TontonTremblay Hi, could you help me with that? I will be a lot of appreciate!

yeah this looks good, I think the real world test you tried your object might be too close to the camera, did you try moving it out a little bit more? Also make sure you show it with the handle! I am not sure how to interpret the results on the real world mug, the beliefs are clearly not correct, when I have seen this in the past it was somewhat confusing the symmetries and or too close to the camera.

yeah this looks good, I think the real world test you tried your object might be too close to the camera, did you try moving it out a little bit more? Also make sure you show it with the handle! I am not sure how to interpret the results on the real world mug, the beliefs are clearly not correct, when I have seen this in the past it was somewhat confusing the symmetries and or too close to the camera.

Thank you for your reply. But it nearly cannot output anything. Sad...

Screenshot from 2024-10-05 06-13-29

yeah this looks good, I think the real world test you tried your object might be too close to the camera, did you try moving it out a little bit more? Also make sure you show it with the handle! I am not sure how to interpret the results on the real world mug, the beliefs are clearly not correct, when I have seen this in the past it was somewhat confusing the symmetries and or too close to the camera.

Thank you for your reply. But it nearly cannot output anything. Sad...

![Screenshot from 2024-10-05 06-13-29](https://private-user-images.githubusercontent.com/131247591/373819111-e49a96a2-f81b-443d-9d4c-235357d038fa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjgwODAzNTMsIm5iZiI6MTcyODA4MDA1MywicGF0aCI6Ii8xMzEyNDc1OTEvMzczODE5MTExLWU0OWE5NmEyLWY4MWItNDQzZC05ZDRjLTIzNTM1N2QwMzhmYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQxMDA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MTAwNFQyMjE0MTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yMTNiM2U0Mjg2NTQxZWYxZWIwYTI0M2I1MGE1YzVhZWIzMGJjZjMwMjQ2OTFlMGI2MTIxZjdmMzViNTUxMjE5JlgtQW16LVN

Screenshot from 2024-10-05 06-16-07
pZ25lZEhlYWRlcnM9aG9zdCJ9._FYYmEjHgLauCBYbrz-UZQ4i5vKnKE4KD4jcFsXPL4Q)

hummm I would think the model did not train long enough, you are seeing good results on the training data though?

hummm I would think the model did not train long enough, you are seeing good results on the training data though?

Here is the output on the training data. By the Inference.py.

00006
00007
00008

00004
00005

hummm I would think the model did not train long enough, you are seeing good results on the training data though?

Sad, but i am really in hurry to use this detection output. I don't know why your weights to the sugar can work but my weights to the mug cannot work given that it can detect the mug in inference.py. But... Could you help me training this mug? I can give you the YCB link to this mug's data...

I would love to help more than just providing insights, but I am sorry, I cannot do much more than that. The results you got on the training data are very encouraging, maybe try with different light, maybe more?