shubham-goel/ucmr

Training on new dataset : Loaded cam_pose_dict size issue

dorsadadjoo opened this issue · 3 comments

Hi,

Thank you for releasing the code!
I'm trying to train the model on a new class of objects (dataset size = 2020).
I started with writing a data loader similar to your json data loader for my own data + your template shape.
I recomputed the NMR-initialized camera multiplex and got "campose_0.npz" as output with a length equal to my dataset size (2020).
Then I ran step 1 and 2. (Train shape+texture and prune top 4 camera poses)
In step 3 I got the following error :

Found display : :0
.
.
.
2020 images
.
.
.
verts:      mean-centering by [ 0.          0.5093799  -0.01742622]
verts_uv:   provided
faces_uv:   from verts_uv
Mesh contains 237x2=474 symmetric vertices, 81 indep vertices
textureImg:     128x256
Loaded cam_pose_dict of size 2016 (should be 2020) 
.
.
.
line 142, in reloadCamsFromDict
   assert((_kk.sort()[0]==torch.arange(_kk.shape[0])).all()) # keys are 0 -> n-1
AssertionError

I checked the inputs and outputs and noticed that in the first step of training, an array with size 2020 is fed to camOpt_shape.py but the created output arrays in "train_cam8x5" directory have a length of 2016 (4 instances missing).

I would really appreciate it if you could give me some hints to find possible problems happening during the first step with my training data.

Hi @dorsadadjoo, thank you for your interest in our work!

In every training epoch, a part of the dataset (last incomplete batch) isn't saved into the raw_20.npz file. Therefore, while trimming cameras, please use --input_file=raw_20.npz --mergewith=raw_19.npz to get the 4 missing instances from the previous epoch.

Thanks, that's right.
Another workaround could be to change the batch-size number to be a factor of dataset size and retrain.

Yes! Closing this, please reopen if need be.