msamogh/nonechucks

Is this package running in parallel or single worker?

spacegoing opened this issue · 8 comments

It seems like it can only use 1 worker even though I opened 20

Hi @spacegoing, I think this issue might be related to #4. I'll look into it soon.

Yes please:D

@spacegoing Do you have a minimal piece of code that I can test with?

@msamogh Sorry I do not have a minimal one. I have a paper due tomorrow. I am currently hurry on that. I will try to write one after that. Many thanks:D

Can you verify if the latest commit fixes the problem? 37eee52

Hi, I have been playing around with nonechucks a bit. I observed, that if I use SafeDataset together with standard DataLoader (using default sequential sampler), my CPUs are fully loaded. However, when I use the DataLoader with SafeSampler, then I see usually only one process running and the others are sleeping (probably waiting for synchronization). Could it be that in SafeSampler __next__() method the threads needs to be synchronized due to the while loop? It is a really HUGE difference in performance between using and not using SafeSampler...

However, I understand that if I use DataLoader without SafeSampler, then the sampled examples can be returned several times, which is not usable in my case.

@brejchajan Thanks for the detailed description. Could you open this as a new issue?

@spacegoing I'll assume this issue to be solved and close it. If your issue still isn't resolved, please let me know.