Why shuffle BN in local group?
jp7c5 opened this issue · 2 comments
Hello again, @HobbitLong.
Following the code, it looks like shuffle BN is applied in each node by the self.local_group variable, but it could be applied in all nodes.
Could you tell me the reason why you chose this way?
I guess that this may have better shuffling time without hurting performance, but if we run this code in multiple nodes with 1 gpu for each, shuffling BN might have no effect.
Hi @jp7c5,
Your understanding is completely correct. The purpose is just to save the shuffle time, as sometimes I will use 32 GPUs across 4 nodes.
but if we run this code in multiple nodes with 1 gpu for each, shuffling BN might have no effect.
Yes, in such case the code needs to be modified (maybe I could provide such an option in the future version).
I see. Thanks for the quick reply!