[Q] GPU support
oonisim opened this issue · 3 comments
AWS documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html) tells "Multi-model endpoints are not supported on GPU instance types.".
Kindly explain if it is not technically possible or not yet implemented.
Hi @oonisim
Do you know, how can we get the inference from multi-model endpoints which require GPU memory?
Thanks
Hi @Vinayaks117 , As per AWS documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html] "Multi-model endpoints are not supported on GPU instance types", not sure if you can run multi model server (please see the AWS github for the multi model server implementation, and I believe it is framework e.g. PyTorch, TF dependent) on GPU instances. Please open a case to AWS support for a correct answer. I am afraid it is the only way.
Sure Thanks @oonisim