Question on parallel training

Question

Question on parallel training

lizhenstat opened this issue 4 years ago · 2 comments

Hi, your work is very interesting, thanks for sharing the code!

As I understand, the losses for different sub-networks are calculated in a for loop in https://github.com/loeweX/Greedy_InfoMax/blob/master/GreedyInfoMax/vision/models/FullModel.py#L102
therefore, it is "asynchronous"

However, I have one question on parallel training:
This great blog says This reduces the amount of communication needed between modules tremendously and allows us to train modules on separate devices.
Does that mean that you put the three submodules on three GPUs and doing the gradient updates simultaneously?

Besides, is it possible to train different modules on one GPU in parallel?

Thanks very much for your time!

Answer 1 · 2021-04-08T12:33:42.000Z

Hi,

Thanks for your interest in our work!

Indeed, I have not implemented the parallel training as described in the blog-post - so far, it is only a theoretical consideration of the potential benefit that the local learning might bring. If you did want to implement it, maybe the below picture can make the idea clearer:

Essentially, each module is trained on a separate GPU with separate gradient updates (depicted by the gray boxes). The only communication between the modules/GPU occurs when a lower module provides a new "dataset" for the upper module to train on (every x epochs or so).

Regarding your second question: Unfortunately, I do not know enough about parallelisations on GPU to answer that. In the current implementation, the different modules can be trained on a single GPU, but they are not trained in parallel.

Best,
Sindy

Answer 2 · 2021-04-09T01:09:53.000Z

Thanks for your quick and detailed reply! I understand your point now.^_^

Best
Lizhen