Incomplete cuboid detection (Real hardware detection)

Question

Incomplete cuboid detection (Real hardware detection)

Opened this issue 5 days ago · 21 comments

Hi, today i transfer my output training pth.file to hardware detection. My object is the red mug in YCB benchmark. However, when i set everything done to do detection, the result is very poor that can not detecte any object. And once very luckly it detecte the mug, a error occurs:

Incomplete cuboid detection.
result from detection: [None, None, None, None, None, None, None, None, (351.5094017726776, 92.23743706236147)]
Skipping.

Furthermore, i would admit my training result is also very poor. Here I show my result for a several pictures:

Here is the command that generate data by nvisii_data_gen:

$ python single_video_pybullet.py --path_single_obj model/mug/textured.obj --scale 1 --nb_frames 10000 --nb_distractors 10 --nb_objects 25

$ python -m torch.distributed.launch --nproc_per_node=1 train.py --data /home/xiangtianyi/src/Deep_Object_Pose/data_generation/nvisii_data_gen/output/output_example --object obj --batchsize 10 --lr 0.001 --epochs 100

Also I cannot assign correct object class name in the output of data generation, it only appears to be "obj".

Could give me any suggestions on these problems? I would be great appreciate!

Answer 1 · 2024-10-01T20:45:04.000Z

ok wow yeah it is not looking too good, can you share some belief maps results?

as for the class name, I think there is a split with the "." you might need to fiddle with this in the data generating script to get the right name.

Answer 2 · 2024-10-02T01:41:36.000Z

ok wow yeah it is not looking too good, can you share some belief maps results?

as for the class name, I think there is a split with the "." you might need to fiddle with this in the data generating script to get the right name.

It doesn't look well. Sad
Could you help me about it?

Answer 3 · 2024-10-02T17:39:01.000Z

It looks good to me, the points it found are well around the mug, the data you have, I think the mugs are getting too big in the image. Also the mug is symmetrical when you are not seeing the handle, but it does not look too problematic. Also I thought that mug was red. is it not?

Answer 4 · 2024-10-02T17:45:55.000Z

It looks good to me, the points it found are well around the mug, the data you have, I think the mugs are getting too big in the image. Also the mug is symmetrical when you are not seeing the handle, but it does not look too problematic. Also I thought that mug was red. is it not?

Thank you for your reply. But what do you mean the mug are getting too big in the image? Does it means I need to shrink the ' --scale 1" in video generation.py?Also the mug is in red color, does it have any problem needed to be fixed?

Answer 5 · 2024-10-02T17:53:00.000Z

they are getting too close to the camera and they look too big. the conv receptors are not well equipped to find these wide relationships. Switching to a transformer architecture could fix that problem. (But I did not do that)

Answer 6 · 2024-10-02T17:59:51.000Z

I am sure if you generate a dataset with like 2-3 mugs and change the the plane locations to be further away, your weights would work just fine, did you try on a webcam?

Answer 7 · 2024-10-02T18:04:21.000Z

I am sure if you generate a dataset with like 2-3 mugs and change the the plane locations to be further away, your weights would work just fine, did you try on a webcam?

I will try to do this. I used to use realsesne DB435 camera but only use colour stream. It can detect sugar, catchup with the weights that you have trained very well. But it cannot even detect the mug in an entire cuboid. sad

Answer 8 · 2024-10-02T18:05:07.000Z

I am sure if you generate a dataset with like 2-3 mugs and change the the plane locations to be further away, your weights would work just fine, did you try on a webcam?

By the way, how to change the the plane locations to be further away？

Answer 9 · 2024-10-02T23:11:16.000Z

can you check the belief on the webcam?

https://github.com/NVlabs/Deep_Object_Pose/blob/master/data_generation/nvisii_data_gen/single_video_pybullet.py#L252-L255 check the -2 I think, play with that.

Answer 10 · 2024-10-04T18:52:50.000Z

hi, sorry for the delay response. But I still have problem about detecting my object, red mug. The dope can publish nothing about the dimension about the mug. However, i find the belief map is quite good.

Answer 11 · 2024-10-04T18:55:54.000Z

Here is my temporary training result. And the training is till going.

Answer 12 · 2024-10-04T19:02:41.000Z

But i can really well detect the sugar, is there something important to publish data that i forget to do?

Answer 13 · 2024-10-04T21:54:58.000Z

I have also changed the texture.obj to let it make more sense. The detection output is OK. Here is an output. But why i can't detect anything when move to the hardware camera?

Answer 14 · 2024-10-04T22:00:38.000Z

@TontonTremblay Hi, could you help me with that? I will be a lot of appreciate!

Answer 15 · 2024-10-04T22:08:51.000Z

yeah this looks good, I think the real world test you tried your object might be too close to the camera, did you try moving it out a little bit more? Also make sure you show it with the handle! I am not sure how to interpret the results on the real world mug, the beliefs are clearly not correct, when I have seen this in the past it was somewhat confusing the symmetries and or too close to the camera.

Answer 16 · 2024-10-04T22:14:12.000Z

yeah this looks good, I think the real world test you tried your object might be too close to the camera, did you try moving it out a little bit more? Also make sure you show it with the handle! I am not sure how to interpret the results on the real world mug, the beliefs are clearly not correct, when I have seen this in the past it was somewhat confusing the symmetries and or too close to the camera.

Thank you for your reply. But it nearly cannot output anything. Sad...

Answer 17 · 2024-10-04T22:17:00.000Z

yeah this looks good, I think the real world test you tried your object might be too close to the camera, did you try moving it out a little bit more? Also make sure you show it with the handle! I am not sure how to interpret the results on the real world mug, the beliefs are clearly not correct, when I have seen this in the past it was somewhat confusing the symmetries and or too close to the camera.

Thank you for your reply. But it nearly cannot output anything. Sad...

![Screenshot from 2024-10-05 06-13-29](https://private-user-images.githubusercontent.com/131247591/373819111-e49a96a2-f81b-443d-9d4c-235357d038fa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjgwODAzNTMsIm5iZiI6MTcyODA4MDA1MywicGF0aCI6Ii8xMzEyNDc1OTEvMzczODE5MTExLWU0OWE5NmEyLWY4MWItNDQzZC05ZDRjLTIzNTM1N2QwMzhmYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQxMDA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MTAwNFQyMjE0MTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yMTNiM2U0Mjg2NTQxZWYxZWIwYTI0M2I1MGE1YzVhZWIzMGJjZjMwMjQ2OTFlMGI2MTIxZjdmMzViNTUxMjE5JlgtQW16LVN

pZ25lZEhlYWRlcnM9aG9zdCJ9._FYYmEjHgLauCBYbrz-UZQ4i5vKnKE4KD4jcFsXPL4Q)

Answer 18 · 2024-10-04T22:17:00.000Z

hummm I would think the model did not train long enough, you are seeing good results on the training data though?

Answer 19 · 2024-10-04T22:18:55.000Z

hummm I would think the model did not train long enough, you are seeing good results on the training data though?

Here is the output on the training data. By the Inference.py.

Answer 20 · 2024-10-04T22:23:38.000Z

hummm I would think the model did not train long enough, you are seeing good results on the training data though?

Sad, but i am really in hurry to use this detection output. I don't know why your weights to the sugar can work but my weights to the mug cannot work given that it can detect the mug in inference.py. But... Could you help me training this mug? I can give you the YCB link to this mug's data...

Answer 21 · 2024-10-05T02:54:17.000Z

I would love to help more than just providing insights, but I am sorry, I cannot do much more than that. The results you got on the training data are very encouraging, maybe try with different light, maybe more?