aws/sagemaker-huggingface-inference-toolkit

No support for multi-GPU

Opened this issue · 3 comments

It seems that it's not possible to run models using multiple gpus, e.g. by passing device_map="auto" to pipelines.

Is there any way to work around this limitation?

What model are you trying to run? We recommend for LLMs to use the LLM container

Yes, I moved over to the TGI container. I had started with the generic container since it's what is used by https://registry.terraform.io/modules/philschmid/sagemaker-huggingface/aws/latest. The lack of multi-gpu support was just quite surprising, especially since it doesn't really seem to be any specific reason for not having it (I suppose no one has just implemented it).

I think this blog post should respond to your question @parviste-fortum : https://www.philschmid.de/sagemaker-multi-replica