bearpaw/pytorch-pose

How to choose a good number of worker and batchsize on multi-GPU?

Closed this issue · 8 comments

my server has 4 1080Ti, when i run code on multi-GPU, it always get trunk, so which number is good.

vicdu commented

@taojake I have the same problem, how do you set the parameters?thx

@vicdu hi,i use 2 GPU, when my config are batchsize of 50 and 20 worker num, the speed is almost 2x,but the acc has little drop.

vicdu commented

@taojake thanks,do you know the reason?and how do you think about adjusting parameters for multiple-gpu?

@vicdu i am not clear.i guess worker need time to load data, too much or too less data will impact gpu to receive data, and it will trunk.

vicdu commented

@taojake Do you think that a large batchsize(~200) will have a bad effect on the results?

@vicdu in general, the res will be better when batchsize is bigger, but in deep learning, it is unpractiaca, so, we always sample some examples randomly, it will represent the distribution of data. all in all, but, in fact, we always choose it by hand, the res will be same if we have fully epochs.

Sometimes large batch size results in bad performance. See https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

Also, the bottleneck here might be the I/O instead of GPU computing.

awesome conclusion, thanks sharing!