loeweX/Greedy_InfoMax

Question on parallel training

lizhenstat opened this issue · 2 comments

Hi, your work is very interesting, thanks for sharing the code!

As I understand, the losses for different sub-networks are calculated in a for loop in https://github.com/loeweX/Greedy_InfoMax/blob/master/GreedyInfoMax/vision/models/FullModel.py#L102
therefore, it is "asynchronous"

However, I have one question on parallel training:
This great blog says This reduces the amount of communication needed between modules tremendously and allows us to train modules on separate devices.
Does that mean that you put the three submodules on three GPUs and doing the gradient updates simultaneously?

Besides, is it possible to train different modules on one GPU in parallel?

Thanks very much for your time!

Hi,

Thanks for your interest in our work!

Indeed, I have not implemented the parallel training as described in the blog-post - so far, it is only a theoretical consideration of the potential benefit that the local learning might bring. If you did want to implement it, maybe the below picture can make the idea clearer:

Screenshot 2021-04-08 at 14 21 46

Essentially, each module is trained on a separate GPU with separate gradient updates (depicted by the gray boxes). The only communication between the modules/GPU occurs when a lower module provides a new "dataset" for the upper module to train on (every x epochs or so).

Regarding your second question: Unfortunately, I do not know enough about parallelisations on GPU to answer that. In the current implementation, the different modules can be trained on a single GPU, but they are not trained in parallel.

Best,
Sindy

Thanks for your quick and detailed reply! I understand your point now.^_^

Best
Lizhen