facebookincubator/AITemplate

Tensor Parallelism

kamalkraj opened this issue · 3 comments

Is it possible to implement Tensor Parallelism using AITemplate?
@tenpercent @apivovarov

Hi,

Currently we don't have plans on intra-op parallelism, do you have issue that your device's memory doesn't fit?

Hi @muchulee8,

I am looking for the best practice for optimizing and deploying a 30B LLM model using Multi GPUs.

One possible solution would be to split the model by transformer blocks and store the weights on separate GPUs. It's not available out of the box in AITemplate and would require some manual routing of activations->inputs between the engines

Another one is to wait until quantization is supported. I think int8/fp8 support is going to happen eventually

Finally, it sounds like 30B parameters might fit on a 80GB A100 in fp16 precision?