awslabs/multi-model-server

[Q] GPU support

oonisim opened this issue · 3 comments

AWS documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html) tells "Multi-model endpoints are not supported on GPU instance types.".

Kindly explain if it is not technically possible or not yet implemented.

Hi @oonisim

Do you know, how can we get the inference from multi-model endpoints which require GPU memory?

Thanks

Hi @Vinayaks117 , As per AWS documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html] "Multi-model endpoints are not supported on GPU instance types", not sure if you can run multi model server (please see the AWS github for the multi model server implementation, and I believe it is framework e.g. PyTorch, TF dependent) on GPU instances. Please open a case to AWS support for a correct answer. I am afraid it is the only way.