WeightRandomSampler does not work properly while DDP

Question

WeightRandomSampler does not work properly while DDP

gabrielpan147 opened this issue 2 years ago · 4 comments

Hello,

I try to reproduce the model, but I noticed when I used WeightRandomSampler as you write in code, the training speed is pretty slow and I believe the DDP training is not initialized properly. Have you met this issues before and could you please let me know how do you solve it?

Best,
Gabriel

Answer 1 · 2022-12-14T08:34:54.000Z

Hi, thanks for bringing this up! It does seem that WeightedRandomSampler isn't straight out of the box compatible with DDP training (similar thread on the issue here Lightning-AI/pytorch-lightning#10946). I did experience no increase in training time across multiple GPUs and this is possibly why.

Let me know if you solve it!

Answer 2 · 2022-12-14T08:44:08.000Z

By the way, there are certain datasets where weighted sampling is not required. We used it here because BBBC021 had quite a large imbalance for some of the labels. An easy fix might be not using weighted sampling if it isn't required!

However I think even that might not be initialized properly and probably does need a fix.

Answer 3 · 2022-12-15T05:57:36.000Z

Thanks for your reply! https://discuss.pytorch.org/t/how-to-use-my-own-sampler-when-i-already-use-distributedsampler/62143/8 is a Distributed Weighted Sampler example that I have already used in some of my experiments, I have not tried it on WS-DINO yet, you could take a look if you're interested in.

I tried to fix it by not using the weighted sampling on my dataset, which has binary labels and several "treatment". However I found the loss did not decrease.. For reference I tested the original DINO and it decreased normally. I haven't found out the reason but will continuously work on it.

Thanks for your good work on it!

Answer 4 · 2022-12-15T08:33:03.000Z

@gabrielpan147 Feel free to send me your implementation and I can have a look