Tensor Parallelism

Question

Tensor Parallelism

kamalkraj opened this issue 2 years ago · 3 comments

kamalkraj commented 2 years ago

Is it possible to implement Tensor Parallelism using AITemplate?
@tenpercent @apivovarov

Answer 1 · 2023-04-24T22:59:51.000Z

Hi,

Currently we don't have plans on intra-op parallelism, do you have issue that your device's memory doesn't fit?

Answer 2 · 2023-04-25T03:55:58.000Z

Hi @muchulee8,

I am looking for the best practice for optimizing and deploying a 30B LLM model using Multi GPUs.

Answer 3 · 2023-05-01T03:37:24.000Z

One possible solution would be to split the model by transformer blocks and store the weights on separate GPUs. It's not available out of the box in AITemplate and would require some manual routing of activations->inputs between the engines

Another one is to wait until quantization is supported. I think int8/fp8 support is going to happen eventually

Finally, it sounds like 30B parameters might fit on a 80GB A100 in fp16 precision?