Hi, I am training LeNet on localhost for 15 iterations. I got the results like in the picture, where it shows the total communication is 900 Mb, but the communication for P0 is already 1673.28 MB. Is there anything wrong with this?
I think there is some issue with the communication part (there is some issue with parallelization as well). Either it is wrapping around the bit size or something else is off. Can you please probe into this issue further?