Something about Dataloader

Question

Something about Dataloader

limchaos opened this issue 6 years ago · 3 comments

torch.utils.data.Dataloader will use the same random seed in every batch. That means, every batch will use the same augmentation.You can have a look on my test.
visulize_dataloder.pdf
Simply add worker_init_fn=lambda x: np.random.seed() will generate different random numbers for each batch, this might make your Pano Stretch Data Augmentation more powerful.
reference:
pytorch/pytorch#5059
https://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader

limchaos commented 6 years ago

: )

Answer 1 · 2019-04-22T17:04:33.000Z

Thanks for the report!
I have reproduce the non-expected behavior mentioned above and observing the dataloader yield same sequence of stretching factor kx, ky in every batch.
Fortunately, images is shuffled in each epoch so the images can still be stretched with different kx, ky.
I have experimented with the fix and observing similar result reported in the original version. (The fix: 83.70% 3DIoU / 2.08% Pixel Error / 0.69% Corner Error)
I believe the fix would help more if the data is smaller, like finetune mode.

BTW, I have add you to the acknowledgement section. Thanks for the report, it really important to know.

Answer 2 · 2019-04-23T09:36:31.000Z

Indeed, images was shuffled, but each image can be augmented at most n different ways. The n is the same as batch size.If the batch size is too small, I guess the performance will be pool.