why not giving the whole model to DataParallel ？

Question

why not giving the whole model to DataParallel ？

Anonymous-so opened this issue 5 years ago · 2 comments

I feel confused about the code in main function as follows :
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): model.features = torch.nn.DataParallel(model.features)
May I ask that why not giving the whole model to DataParallel ？like
model = torch.nn.DataParallel(model).cuda()

Answer 1 · 2019-08-13T19:18:03.000Z

For the fully-connected (FC) layers, there are too many weights. Copying the weights to multiple GPUs costs lots of execution time. Avoid the weights copying by computing FC layers on a single GPU can help accelerate the computation.

Answer 2 · 2019-08-14T01:27:37.000Z

Oh I see~thanks!