InvocationsPerInstance not published by async sagemaker endpoint

Question

InvocationsPerInstance not published by async sagemaker endpoint

alexanderduben opened this issue 3 years ago · 3 comments

Hi, I took inspiration from your work and I am glad you've shared this example but I think the SageMakerVariantInvocationsPerInstance metric specification can't be used with async endpoints for autoscaling. It is stated in the official doc https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-monitor.html that InvocationsPerInstance metric is not published with async endpoints. I've tested the autoscaling with this metric and it really does not seem to be scaling at all. Some other custom metric supported by async endpoint should solve this issue.

Answer 1 · 2022-03-23T08:21:36.000Z

@alexanderduben You are correct for async it needs to be ApproximateBacklogSizePerInstance. I ll try to apply this soon.
In the meantime you can check out this example using the python-sagemaker-sdk to deploy. There the correct metric is used

Answer 2 · 2023-07-17T11:05:40.000Z

@philschmid has there been any progress on this?

Answer 3 · 2023-07-17T11:47:13.000Z

No, not having time currently. Feel free to open a PR if you have.