adobe/aio-lib-state

state.get bad performance under cold and warm starts

shazron opened this issue · 7 comments

See investigations under #63

Expected Behaviour

Under a cold start, a state.get will take approx less than a second.

Actual Behaviour

Under a cold start, a state.get will take approx 1800ms.

Possible issues

On a warm start, a state.get will still take approx 450ms

The @azure/cosmos promise that is resolved here, takes up 99.9% of the time for a state.get call:

  1. const response = await _wrap(this._cosmos.container.item(key, this._cosmos.partitionKey).read(), { key })
  2. https://github.com/Azure/azure-sdk-for-js/blob/1a77eb5fae58a3bb7ced6610dbdf9afe500bab60/sdk/cosmosdb/cosmos/src/client/Container/Container.ts#L109
  3. https://github.com/Azure/azure-sdk-for-js/blob/1a77eb5fae58a3bb7ced6610dbdf9afe500bab60/sdk/cosmosdb/cosmos/src/client/Item/Item.ts#L73

I don't think there can be any more code optimizations possible here since it seems the bottleneck is the CosmosDB read call. The only possible solutions I can see are:

  1. (network) perhaps the data is read from a far away Azure region increasing network latency?
  2. (server) perhaps there is a configuration setting in Azure that will help with CosmosDB NoSQL reads? (partitioning key strategies?)
  3. (client) perhaps there is a more optimal way to use the @azure/cosmos SDK

Suggestions from the team:

  1. Re-test with a VPN connection to the US or Europe (the test was from Singapore / India VPN)
  2. Re-test with just the bare @azure/sdk -- for possible inclusion in a bug to be filed with Azure. The perf timings however, already are granular and test the @azure/sdk itself (I modified the @azure/sdk node code to add the timings).
  3. Direct mode for the @azure/sdk for Node.js - Azure/azure-sdk-for-js#4807 this is only available for the Java sdk currently. No ETA for Node.js support -- according to a comment on the linked issue, direct mode support for the Java sdk took them 8 months with 3 devs.

We already have multi-region support (US and Europe) so suggestion 1 could help isolate the issue.

Thanks for the summary @shazron !
What about re-testing with direct calls to the Azure HTTP API (i.e. not using the @azure/sdk at all)?

What about re-testing with direct calls to the Azure HTTP API (i.e. not using the @azure/sdk at all)?

Good idea. That would help isolate whether there is something else in the SDK that is causing the bottleneck, not the network call itself.

image
Getting auth error even though I have access to app builder

Stale, and not valid anymore - v4 of this lib connects to a new State store which will have different behaviour.
No changes will be made to the old state store.