Custom training model won't recognise anything
Closed this issue · 11 comments
Hello,
Firstly, thank you very much for your responsiveness and for maintaining DOPE! I am trying to train DOPE with new synthetic data to test if it will improve the detection of our objects (which have slightly different appearance than the one you trained DOPE originally with). However, there is a strange behaviour that I will describe below.
Details about my training process
- I use this script to generate the synthetic data.
- I modified the above script as well as the one is calling (
single_video_pybullet.py
) such that they do not put any other random objects in the scene, except my custom object (a Cheezit with a custom texture). I specify my custom object using--path_single_obj path/to/my/model.obj
. That is, all synthetic images include our object only (or occasionally no object because the randomiser places far away/outside the image boundaries?). - I generate 20,000 images. These are generated under
dope/scripts/nvisii_data_gen/output/dataset
. It includes 100 directories numbered like000
,001
etc. Each of the numbered directories includes 200 data points (these files for each data point: a.png
file, a.depth.exr
file, a.json
file, and a.seg.exr
file). - I then split these 100 directories into two directories. I keep 80% in the
dataset
and created adope/scripts/nvisii_data_gen/output/test_data
to keep 20% of the data.
- I modified the above script as well as the one is calling (
- I then use this script to train DOPE. The exact command I use is the following:
python3 train.py \
--data nvisii_data_gen/output/dataset/ \
--datatest nvisii_data_gen/output/test_data/ \
--object single_obj_0 \
--epochs 10 \
--batchsize 24
- This training results in a directory
train_tmp/
which includes:header.txt
,loss_test.csv
,loss_train.csv
,test_metric.csv
, and a.pth
file for each epoch. - I pick one of the
.pth
files fromtrain_tmp
and place it underdope/weights
. I then modify theconfig/config_pose.yaml
to let DOPE know about the new weights file. - I then use the launch file to start DOPE.
What happens with a real image view
When I use the new .pth
file, DOPE fails to recognize any valid points, even though a Cheezit is clearly within the RealSense camera's view. However, if I alter the config_pose.yaml
file to replace my custom .pth
file with the pre-trained cracker_60.pth
that you provide, DOPE successfully detects our Cheezit despite slight differences in appearance between our Cheezit and yours!
Sanity checks - strange behaviour
Considering these observations, I created a mock 'realsense' ROS script that constantly publishes an image from the synthetic dataset. I then run DOPE using my custom .pth
file to check if it will work with an image from the training dataset. However, the issue persisted: DOPE fails to detect any valid points using my custom .pth
file. I then used the dummy script again but switched to the cracker_60.pth
file instead of my custom one. Using your cracker_60.pth
file it kind of worked, however, it throws the following error:
...
cv2.solvePnP failed with an error
9 valid points found
cv2.solvePnP failed with an error
9 valid points found
cv2.solvePnP failed with an error
9 valid points found
cv2.solvePnP failed with an error
9 valid points found
...
For reference, here is the image my dummy script constantly publishes (i.e., an image from the synthetic dataset):
I would appreciate any pointers to address this.
Thank you in advance.
It sounds like you are doing the right thing, have you visualize the belief maps? If you are using inference.py I think you can do --show_beliefs. In ros node you need to pass it to the config I think. Then in ros you have access to a topic for the beliefs. If the belief maps looks good, then you start debugging the pnp part, possibly the cuboid size is creating problems. It also might be that the version you trained the point orders is wrong (which I hope is not the case, it would be annoying to debug but possible). You can also look at these two https://github.com/NVlabs/Deep_Object_Pose/blob/master/src/dope/inference/cuboid_pnp_solver.py#L100-L101. Report the belief maps then we can see.
Hello @TontonTremblay,
Thanks for getting back to me. Here is the belief using my custom .pth
file:
Here is the belief using your cracker_60.pth
file:
I had a look at the training data, they looked a little suspicious (too good to be true?). The loss during testing goes down to 1.14e-08!
This almost makes me think that the script during training isn't reading the "ground truth" properly. Here is an example .json
file from the synthetic data:
{
"camera_data": {
"camera_look_at": {
"at": [
1.0,
0.0,
0.0
],
"eye": [
0.0,
0.0,
0.0
],
"up": [
0.0,
0.0,
1.0
]
},
"camera_view_matrix": [
[
0.0,
0.0,
1.0,
0.0
],
[
-1.0,
0.0,
0.0,
0.0
],
[
0.0,
-1.0,
0.0,
0.0
],
[
0.0,
0.0,
0.0,
1.0
]
],
"height": 500,
"intrinsics": {
"cx": 250.0,
"cy": 250.0,
"fx": 603.5535278320312,
"fy": 603.5535278320312
},
"location_worldframe": [
-0.0,
0.0,
-0.0
],
"quaternion_xyzw_worldframe": [
-0.5,
0.5,
-0.5,
0.5
],
"width": 500
},
"objects": [
{
"bounding_box_minx_maxx_miny_maxy": [
80,
289,
361,
500
],
"class": "obj",
"local_cuboid": null,
"local_to_world_matrix": [
[
1.8920022249221802,
-0.16087310016155243,
-0.6280502080917358,
-0.0
],
[
-0.6219491362571716,
-0.9973906874656677,
-1.6181448698043823,
0.0
],
[
-0.18304753303527832,
1.7260745763778687,
-0.9935604333877563,
-0.0
],
[
1.5750807523727417,
-0.03503342717885971,
-0.4613076448440552,
1.0
]
],
"location": [
0.03503342717885971,
0.4613076448440552,
1.5750807523727417
],
"location_worldframe": [
1.5750807523727417,
-0.03503342717885971,
-0.4613076448440552
],
"name": "single_obj_0",
"projected_cuboid": [
[
95.4332947731018,
444.68867778778076
],
[
76.83844864368439,
444.17738914489746
],
[
134.66638326644897,
572.567343711853
],
[
149.38680827617645,
561.771810054779
],
[
231.4603179693222,
363.9290928840637
],
[
225.31068325042725,
356.3215136528015
],
[
292.0815348625183,
475.01134872436523
],
[
292.8772568702698,
472.88691997528076
],
[
187.41339445114136,
458.8717818260193
]
],
"provenance": "nvisii",
"px_count_all": 0,
"px_count_visib": 0,
"quaternion_xyzw": [
0.3544142246246338,
0.9339839220046997,
-0.006243467330932617,
-0.927586019039154
],
"quaternion_xyzw_worldframe": [
1.104870319366455,
-0.17712989449501038,
-0.18352781236171722,
-0.7566995620727539
],
"segmentation_id": 2,
"visibility": 1
}
]
}
Based on the above json file, when I train DOPE I used the following command:
python3 train.py \
--data nvisii_data_gen/output/dataset/ \
--datatest nvisii_data_gen/output/test_data/ \
--object single_obj_0 \
--epochs 10 \
--batchsize 24
Pointing the script to the single_obj_0
in the .json
file?
Yeah your model is not working. Sorry, can you check the tensorboard images in the training dir/ you will see your annotation from your data. I think you did not train long enough tbh. It is a big a network to train.
Hello @TontonTremblay, not sure how to get those? The train.py
script just creates a train_tmp
dir which only includes .pth
files and few .csv
files?
can you try with train2.py ?
Unfortunately that script isn't runnable on my machine due to CUDA issues. I will try few more things to get that script running, but I have the feeling that my CUDA version (12.0) and pytorch supported version conflict with the code which throws exceptions. I know it's been a while since you ran that script, but do you know which CUDA / PyTorch versions you were using?
So if you check the train2.py there is a part where tensorboard is used to save belief maps. Just take that part and manually save belief maps as you train. This will help you know when it is trained enough.
Hi @TontonTremblay,
A quick update. Apparently, everything works. When I used the train.py
I was passing --object single_obj_0
to tell it which object to train on, but then realised that the CSV parser in the script was failing to pick anything. I simply removed that flag when I trained and everything seems to work.
I realised that, when comparing my custom trained model (which uses a different texture for Cheezit that matches the object we have in the lab) with yours, yours still performs better. I am trying to understand what I am doing wrong:
- Reading your DOPE paper, I see that you generated 60k photorealistic images and mixed with 60k domain-randomised images.
- I generated 120,000 images using your nvisii script here and trained with these. I thought that the script will generate noisy images as a proxy to domain-randomised images?
Can I please ask if you used a different approach to train DOPE? Did you also use the nvisii script to generate all the data? There are some flags in that script like --motionblur
which I did not use, for example.
Many thanks.
Wow amazing job, sorry this took a lot of time to get there. I think the problem is more related to data diversity. The domain randomization style dataset works well but will never work as well as mixing data style. I think the original DOPE paper has some experiments about mixing in percentage DR and Photorealistic.
When recreating this with your own data, the problem arises to recreate the same sort of data. Here I only shared half of the solution. DR will get you to a solution that will work fine for most cases. For example, for our models for HOPE I only used DR, as generating this data is quite simple. Generating photorealistic is a lot more work, e.g., finding 3d scenes that are correctly light up, that is harder. I had started working on some solutions on nvisii for this, but ended up having to put this aside.
So what solution do I have for you. I have a one, but yet you will have to learn a new tool, sorry this somewhat my fault. A while ago I wanted to learn more about blender, and I rewrote part of the data export we used in nvisii in blender. https://github.com/TontonTremblay/mvs_objaverse#falling-scene you can check this one. These scenes are not quite the same as what we used in DR, but it would probably get you to where you want to get.
If you want my real two cents, I would not go down this direction. I would keep the weights you have there, and use it as a detector, then use something like megapose and or diff-dope to get a really good pose. Or if you want something faster, you could check https://github.com/nv-nguyen/gigaPose that runs quite fast, although the code is not yet available, it should be in the next couple days though. Anyway, I would say DOPE is accessible, but in general it has an older approach to pose estimation. Sorry this is messy, I am working on a new pipeline to simplify this process.
Hi @TontonTremblay,
Thank you very much for all the information! I will have a look around and see how to move forward. I have seen your HOPE paper, and we ordered those objects so we can use your models, hopefully we can have more accurate DOPE predictions with HOPE objects since they should be identical with the ones you trained with.
I would like to thank you for being so responsive and for offerings so much to the community, including high-quality code and projects. Thanks!
It is a pleasure to be as helpful as possible. I appreciate the kind words, it motivates me to continue what we are doing. ❤️