turboderp/exllamav2

Pipeline mode support

Closed this issue · 2 comments

I found that torch2.4 officially supports Pipeline Parallelism, the package name is torch.distributed.pipelining。 It was migrated from the PiPPy project.

Will exllamav2 add support for Pipeline Parallelism sometime?

Possibly, but not before fully exploring tensor parallelism which potentially makes PP totally redundant. I.e. why run two staggered batches at 1x speed when you can run a single batch at 2x speed?

TP have high communication requirements, so the performance improvement for PCIe devices might not be as significant as expected. On the other hand, PP has lower communication requirements, theoretically leading to a larger performance boost.

However, PP doesn't seem to be very useful for single-batch inference. 😂😂😂