Measuring Communication Amount

Question

Measuring Communication Amount

Closed this issue 8 months ago · 3 comments

Hi, thanks for the great work! I'm wondering how is the communication amount in Table 2 of the paper is calculated. Are those calculations available in the evaluation script?

Answer 1 · 2024-05-07T03:54:27.000Z

In this PatchParallelismCommManager, we print the buffer size on each device (self.numel). You can enable the profiling with verbose=True when initializing the distri_config.

For AllGather, when using ring AllGather, the communication amount is $s \times (n-1) \times 2$, where $s$ is the buffer size, $n$ is the number of devices and 2 stands for 2 bytes for FP16 precision.

Answer 2 · 2024-05-07T03:59:38.000Z

For AllReduce, when using ring AllReduce, the communication amount is $s \times \frac{n-1}{n} \times 2 \times 2$. The first 2 stands for the 2 rounds of ring AllReduce and the second 2 stands for 2 bytes for FP16 precision. For Tensor Parallelism, our code does not support printing the buffer size s for now. You can easily calculate it by summing up all the AllReduced tensor's numel in attention.py, conv.py, feed_forward.py and resnet.py.

Answer 3 · 2024-05-07T04:02:08.000Z

You can refer to our efficientml.ai slides (page 50) for these communication primitives.