Learning custom data always 0%

Question

Learning custom data always 0%

Opened this issue a year ago · 8 comments

I have defined an object, as the Ketchup one. I have a generated 1000 images with the following command:

python single_video_pybullet.py --nb_frames 1000 --scale 0.015 --path_single_obj ~/Deep_Object_Pose/scripts/nvisii_data_gen/models/iros_block/google_16k/textured_simple.obj --nb_distractors 0 --nb_object 5

And I was able to obtain the 1000 images with n object like this:

Then, I tried to use the train with the aforementioned set of images with:

python -m torch.distributed.launch --nproc_per_node=1 train.py --network dope --epochs 25 --batchsize 10 --outf tmp/ --data ../nvisii_data_gen/output/output_example/

I tried with different epochs, batchsize and generating more times the set of images, howver I obtain always 0% for each epoch:

I am kind of new with learning so I do not know in deep the details, what I am doing wrong ?
In the csv files inside output folder, there are no data, only the header. In addition, I add the flag --save I have no results.

Thank you !

Answer 1 · 2023-09-30T16:05:10.000Z

Let it train to epoch 100. And also check the output on in the tensorboard.

tensorboard --logdir /path/to/experiment/

Then you open chrome/firefox to the localhost and check the image tab. Check some other issues here to see what sort of output you should get.

Answer 2 · 2023-10-03T06:21:41.000Z

I tried to do it, however, I still have 0% for each epoch.

I also tried to use a reduced dataset of 5 images.

From tensorboard I get the following info:

The second epoch is the following:

After more than 50 epochs I have:

Answer 3 · 2023-10-03T07:27:30.000Z

lower the learning rate a tad. The 0% is about the data it loads, not the perf. Sorry. I should update this. Can you try on a single image? Normally I test this first.

Answer 4 · 2023-10-03T08:11:08.000Z

I did a test with lr=0.00001, one image only, 100 epochs and batch size to 2.

Results in the end:

Answer 5 · 2023-10-03T08:36:04.000Z

The train belief guess should look like the gt above it, can you run it for longer. run it for like 1000 epochs.

Answer 6 · 2023-10-03T09:54:18.000Z

I have changed the background of the input image and added symmetry information and trained on the following image:

I run it for 1000 epochs and in the end the result was the following:

It seems much better! Do you think that now I can train on a bigger dataset with more instances of objects/distractors?

Answer 7 · 2023-10-03T11:15:47.000Z

Are you aware of the symmetries in your object? Check the generating data with symmetries. But yeah this looks good now. DOPE takes a while to train, so you will have to patient, like on a 60k image dataset I train for ~30 epochs.

Answer 8 · 2023-10-03T14:09:54.000Z

I will adjust everything and try to run with more images if the PC allows me. Thank you!