Batch size and optimizer

Question

Batch size and optimizer

Closed this issue 4 years ago · 6 comments

Hi,
I see that the profiler calculates memory and execution times based on a particular batch size.
But the optimizer code does not take in any batch size parameter. So, does that mean, in the optimizer logic, the execution times and activation memories are normalized?

Best

Answer 1 · 2020-06-05T03:48:33.000Z

Why do they need to be? Right now, the optimizer is not trying to sweep batch size? It assumes that this is provided as an input

Answer 2 · 2020-06-05T04:12:41.000Z

So the optimizer is equally dividing the work among all GPUs irrespective of the memory available?

Answer 3 · 2020-06-05T04:16:26.000Z

No, that's not true. But why do you need to normalize by batch size to decide whether something fits? I am saying that the batch size is not a knob we try to tune -- given a batch size, we know the computation times and activation sizes, and we can use this information to make placement decisions for this particular batch size

Answer 4 · 2020-06-05T04:23:00.000Z

I see your point. So then if I am to run for different batch size, I have to start from the profiler, then optimizer,... etc? Just wanted to verify this.

I was under the impression that the profiler can be reused for any batch size (of a particular model).

Answer 5 · 2020-06-05T04:28:54.000Z

Right, that's correct. The computation time doesn't scale linearly with the batch size (throughput itself is a function of batch size), so you would probably want to run the profiler for each batch size anyway to get an accurate timing estimate (you could reuse activation size measurements, but we don't currently do this)...hope this clarifies!

Answer 6 · 2020-06-05T04:39:26.000Z

Thanks for the clarification.