state.get bad performance under cold and warm starts
shazron opened this issue · 7 comments
See investigations under #63
Expected Behaviour
Under a cold start, a state.get will take approx less than a second.
Actual Behaviour
Under a cold start, a state.get will take approx 1800ms.
Possible issues
On a warm start, a state.get will still take approx 450ms
The @azure/cosmos
promise that is resolved here, takes up 99.9% of the time for a state.get
call:
aio-lib-state/lib/impl/CosmosStateStore.js
Line 128 in e296e3e
- https://github.com/Azure/azure-sdk-for-js/blob/1a77eb5fae58a3bb7ced6610dbdf9afe500bab60/sdk/cosmosdb/cosmos/src/client/Container/Container.ts#L109
- https://github.com/Azure/azure-sdk-for-js/blob/1a77eb5fae58a3bb7ced6610dbdf9afe500bab60/sdk/cosmosdb/cosmos/src/client/Item/Item.ts#L73
I don't think there can be any more code optimizations possible here since it seems the bottleneck is the CosmosDB read call. The only possible solutions I can see are:
- (network) perhaps the data is read from a far away Azure region increasing network latency?
- (server) perhaps there is a configuration setting in Azure that will help with CosmosDB NoSQL reads? (partitioning key strategies?)
- (client) perhaps there is a more optimal way to use the @azure/cosmos SDK
JIRA issue created: https://jira.corp.adobe.com/browse/ACNA-1155
Suggestions from the team:
- Re-test with a VPN connection to the US or Europe (the test was from Singapore / India VPN)
- Re-test with just the bare @azure/sdk -- for possible inclusion in a bug to be filed with Azure. The perf timings however, already are granular and test the @azure/sdk itself (I modified the @azure/sdk node code to add the timings).
- Direct mode for the @azure/sdk for Node.js - Azure/azure-sdk-for-js#4807 this is only available for the Java sdk currently. No ETA for Node.js support -- according to a comment on the linked issue, direct mode support for the Java sdk took them 8 months with 3 devs.
We already have multi-region support (US and Europe) so suggestion 1 could help isolate the issue.
Thanks for the summary @shazron !
What about re-testing with direct calls to the Azure HTTP API (i.e. not using the @azure/sdk at all)?
What about re-testing with direct calls to the Azure HTTP API (i.e. not using the @azure/sdk at all)?
Good idea. That would help isolate whether there is something else in the SDK that is causing the bottleneck, not the network call itself.
Stale, and not valid anymore - v4 of this lib connects to a new State store which will have different behaviour.
No changes will be made to the old state store.