WeightRandomSampler does not work properly while DDP
gabrielpan147 opened this issue · 4 comments
Hello,
I try to reproduce the model, but I noticed when I used WeightRandomSampler as you write in code, the training speed is pretty slow and I believe the DDP training is not initialized properly. Have you met this issues before and could you please let me know how do you solve it?
Best,
Gabriel
Hi, thanks for bringing this up! It does seem that WeightedRandomSampler isn't straight out of the box compatible with DDP training (similar thread on the issue here Lightning-AI/pytorch-lightning#10946). I did experience no increase in training time across multiple GPUs and this is possibly why.
Let me know if you solve it!
By the way, there are certain datasets where weighted sampling is not required. We used it here because BBBC021 had quite a large imbalance for some of the labels. An easy fix might be not using weighted sampling if it isn't required!
However I think even that might not be initialized properly and probably does need a fix.
Thanks for your reply! https://discuss.pytorch.org/t/how-to-use-my-own-sampler-when-i-already-use-distributedsampler/62143/8 is a Distributed Weighted Sampler example that I have already used in some of my experiments, I have not tried it on WS-DINO yet, you could take a look if you're interested in.
I tried to fix it by not using the weighted sampling on my dataset, which has binary labels and several "treatment". However I found the loss did not decrease.. For reference I tested the original DINO and it decreased normally. I haven't found out the reason but will continuously work on it.
Thanks for your good work on it!
@gabrielpan147 Feel free to send me your implementation and I can have a look