jiecaoyu/pytorch_imagenet

why not giving the whole model to DataParallel ?

Anonymous-so opened this issue · 2 comments

I feel confused about the code in main function as follows :
if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): model.features = torch.nn.DataParallel(model.features)
May I ask that why not giving the whole model to DataParallel ?like
model = torch.nn.DataParallel(model).cuda()

For the fully-connected (FC) layers, there are too many weights. Copying the weights to multiple GPUs costs lots of execution time. Avoid the weights copying by computing FC layers on a single GPU can help accelerate the computation.

Oh I see~thanks!