InvocationsPerInstance not published by async sagemaker endpoint
alexanderduben opened this issue · 3 comments
Hi, I took inspiration from your work and I am glad you've shared this example but I think the SageMakerVariantInvocationsPerInstance metric specification can't be used with async endpoints for autoscaling. It is stated in the official doc https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-monitor.html that InvocationsPerInstance metric is not published with async endpoints. I've tested the autoscaling with this metric and it really does not seem to be scaling at all. Some other custom metric supported by async endpoint should solve this issue.
@alexanderduben You are correct for async it needs to be ApproximateBacklogSizePerInstance
. I ll try to apply this soon.
In the meantime you can check out this example using the python-sagemaker-sdk to deploy. There the correct metric is used
@philschmid has there been any progress on this?
No, not having time currently. Feel free to open a PR if you have.