[Performance] `Onyx.get` batching
kidroca opened this issue · 23 comments
In the benchmarks performed here: Expensify/App#2667 (comment) I've noticed that sequential reads happen very fast 8-9ms
, much faster than what it takes in E.cash
I think the reason is that Onyx
will blast AsyncStorage with as many calls as it need.
Browsing the AsyncStorage
code I've found this optimization
if (!this._immediate) {
this._immediate = setImmediate(() => {
this._immediate = null;
this.flushGetRequests();
});
}
/.../
this._getRequests.push(getRequest);
// avoid fetching duplicates
keys.forEach((key) => {
if (this._getKeys.indexOf(key) === -1) {
this._getKeys.push(key);
}
});
When multiGet
is called multiple times in a very short span of time (1 tick) it would batch all the calls that happen and invoke flushGetRequests
on the next tick
- 1 trip across the bridge
- 1 sql query
From the docs of flushGetRequests
The following batched functions are useful for executing a lot of operations at once, allowing for native optimization
I think that's exactly what happens when E.cash starts - a lot of operations at once
Proposal
Test the following theory:
Update Onyx.get
to use multiGet
internally when it fetches data from AsyncStorage
this way the operations that happen at the same time would be batched
We can try implementing our own get
batching that waits a bit longer 2-3ms
to capture more get calls and then make a multiGet
request
Sample benchmarks
Reading a random key and ~1kb
text value
Based on 50+ tests of each
- Getting 10 items using
AsyncStorage.getItem
in parallel:- time to complete: min: 53.7ms, max: 72.1ms, avg: 62ms
- Getting 10 items using a single
AsyncStorage.multiGet
:- time to complete: min: 21.3ms, max: 95.7ms, avg: 32.3ms
- Getting 100 items using
AsyncStorage.getItem
in parallel:- time to complete: min: 546.1ms, max: 644.8ms, avg: 602.7ms
- Getting 100 items using a single
AsyncStorage.multiGet
:- time to complete: min: 46.4ms, max: 76.6ms, avg: 59.6ms
I've observed that there are gains even for 10
parallel calls made to AsyncStorage.getItem
, while as we get higher the performance degrades exponentially, multiGet
is not affected as much by the 10x increase of throughput
Additional notes
Write operations take about twice as much and can be optimized the same way
For my sample chat there are around 80-100 calls to Onyx.set
during init with the following timings:
- max: 2.08sec
- min: 22.200ms
- avg: 275.237ms
I like the idea of batching everything in one tick, but I'm not a fan of batching arbitrarily based upon time frames. "Magic number" solutions where we pick some arbitrary number (why 2ms vs 20m or 200ms) out of the hat require constant maintenance find the right numbers, and those numbers can change that are effectiveness unpredictably.
I like the idea of batching until the JS engine is idle (which if I understand correctly, is what you mean by one tick), but I'm not sure the argument why we would wait longer than that.
why 2ms vs 20m or 200ms
Let's say an average read from disk takes anywhere between 50ms to 500ms, by delaying execution by up to 10% (2-5ms) we can save the roundtrip for potential requests that might happen in this window of time
This is very likely to happen during init where more data can be captured this way
- a tick might be enough to batch 2 requests where a "forgivable" delay would capture 20
- it's a tradeoff hard reads would happen 2-5ms slower but every time a 2nd request is batched you save 50-500ms
- cached reads are unaffected and still happen instantly
- we can drop the optimization after the app hydrates
The idea here is the first get
triggers the batch of 2-5ms then the batch flushes
along with any other gets
arrived by that time
I like the idea of batching until the JS engine is idle (which if I understand correctly, is what you mean by one tick)
We can't reliably detect that, and it might be too long to wait to get idle, usually reads are triggered by components that need data "now" and can't afford to wait to idle. Going this direction would make complicated logic "to feed the most hungry first"
"next tick" is the next iteration of the event loop, it's very similar to a setTimeout()
with a 0ms delay. It might be to small to capture Onyx connections resulting from component render batches
Just to confirm, reads are triggered by user or network action right? So the only difference between "batching one tick" or "batching one tick + 20ms" is the possibility that additional user/network activity happened in that tiny 20ms window, right? It feels like the odds of that helping are extremely low.
Rendering a screen does not happen all at once, it also does not trigger the needed Onyx connections at once - it's more like an avalanche effect
A tree is rendered from the children to the parent
First the child components mount and their lifecycle methods triggered (componentDidMount)
Some of these children will initiate Onyx connections
Then the component branch above this tree renders, it again triggers Onyx connections
Then the next and so on
This process does not happen in a single tick, but happens very fast
Implementing batching with a delay/debounce can capture more get
calls as the avalanche is building up
We did something similar with hooking into existing reads - when 2 components want the same thing, but it's not in cache
. When the 2nd component mounts instead of starting a new call to get the "thing" it hooks into the pending read result
It was proven that this happens and is beneficial
This happens mostly during init where many components are added to the tree as the app initializes
This is the theory
- App init
- static
Onyx.connect
calls in.js
files are evaluated first. These are the calls most likely to get picked up in a batch within a singletick
- Auth state and data is evaluated and the app continues to render the relevant component Tree
- Tick(s) pass
- Navigation is mounting
- Tick(s) pass
- Top level navigator stacks are mounting
- Tick(s) pass
- The active screen begins mounting
- As we get deeper different Onyx data needs to be resolve
We won't be able to capture all of this in a single batch as some items are codependent - you need to resolve Auth to continue with the rest of the app, but if we manage to group larger batches we should see a big improvement in time to resolve data from storage
Great, thank you for that! However, there still isn't actually any "delay" between the avalanching cycles -- if one cycle of Onyx.gets() triggers a second cycle of Onyx.gets(), it's not like it waits 2-20ms between "ticks". Rather, immediately after one tick completes, the next one starts.
So aren't we really trying to wait for all the cycles to stop avalanching, before doing any Onyx.gets()? In other words, we aren't actively looking to wait any amount of time at all. Rather, we just want to wait until the avalanche is "done" before sending all the get over in a large batch.
If that's correct, are you saying that it's in practice very difficult to detect when the avalanche is done, and thus there's no more need to wait? Why wouldn't setTimeout(0)
work -- why does the delay need to be nonzero?
So aren't we really trying to wait for all the cycles to stop avalanching, before doing any Onyx.gets()? In other words, we aren't actively looking to wait any amount of time at all. Rather, we just want to wait until the avalanche is "done" before sending all the get over in a large batch.
Yes ideally we would hook to some event instead of waiting an arbitrary time, but I'm not sure if that's possible at all.
Why wouldn't setTimeout(0) work -- why does the delay need to be nonzero?
setTimeout(0)
would be about the same as with the next tick, and is just not enough
Hey, remember the xlsx that I did originally they contain start time information of each Onyx.get
call, check the start times of the get
calls here: #70 (comment)
2ms window
- 1350859.00ms
- 1350861.00ms
5ms window
- 1350935.00ms
- 1350937.00ms
- 1350939.00ms
Here's a full xls: https://docs.google.com/spreadsheets/d/15GkUEUOyVLk02y8_-PFAJFF4Ej7OsyXfdbyJgFoC_4Q/edit?usp=sharing
The only time I see where get
calls are on the same millisecond is for calls that happen in a foreach cycle, but this is handled in #78 where instead of for-eaching we can directly call multiGet
with all the keys
So in reality there are calls that happen in very close proximity, it's just how React rendering works and Onyx relies on that.
A batching with delay
can serve as a easy POC whether we'll gain something out of this and then we can try to improve by hooking to something exposed by React to sync with it's rendering process
Some more timing information, this time on iOS
iOS Onyx.get timings durring init.xlsx
Calls are ordered by startTime
A lot of keys are retrieved between 1.563 to 1.567
The ones taking less than 10ms are coming from cache - you can ignore them
But there are a a lot of other keys that can be batched together for a hard read
I think that's exactly what happens when E.cash starts - a lot of operations at once
I'm having trouble following some of this as it's a bit over my head. But it sounds like we are proposing that batching calls to Onyx.get()
will improve the time it takes for the app to start (+ other things more generally) since we "get a lot of things from storage" when the app first boots up.
Does that sound accurate?
Can we easily test this by:
- Time out how long it takes to cold boot the app before and after these changes
- Time how long it takes to render chat A while chat B is open before and after these changes
I've tested this while working on #88
just using multiGet
instead of get
is enough to batch ~30 keys right at the start of the app
Then as the app continues to load 2-3 keys are fetched together from storage occasionally
We can try implementing our own get batching that waits a bit longer 2-3ms to capture more get calls and then make a multiGet request
Looks like this wont be needed, or we can use setImmediate
Even though retrieval became about twice as fast, the app seem to became slower as many withOnyx
component would receive data at once , which caused hangs
Having some delays between notifying Onyx subscribers seem to make the app to run better...
just using multiGet instead of get is enough to batch ~30 keys right at the start of the app
Then as the app continues to load 2-3 keys are fetched together from storage occasionally
Even though retrieval became about twice as fast, the app seem to became slower as many withOnyx component would >receive data at once , which caused hangs
Cool thanks for the summary! Lemme know if I've got the general idea correct...
- We could batch calls to
Onyx.get()
when the app inits and this will make data retrieval twice as fast vs. not doing this
Even though retrieval became about twice as fast, the app seem to became slower as many withOnyx component would receive data at once , which caused hangs
- However, this creates a new problem to deal with which is...
the app seem to became slower as many withOnyx component would receive data at once
And this is not normally observable because withOnyx
connected components are usually updated in a more staggered way and not all at once? Any more thoughts on the React rendering side of this? Why is it that when we put space between updating subscribers there are no issues, but when we update them all at once there is a hanging effect?
The "hang" effect happens when switching chats and only on Android.
It seems to be related to:
- general CPU intensive rendering of that view on a "weaker" device cause it to hang. This might be improved if the chat items become cheaper to render
- staggering seems to help a lot in Android, otherwise it's overwhelmed with data it has to render, maybe this is a part where the Virtualization Config can play a role and not allow to pre-render too much. Other FlatList / ActionItem improvements can help as well
- iOS native implementation of the FlatList might be better than Android
Since I already have the change in git history, I can put it in a separate branch to test again after improvements in E.cash
Since I already have the change in git history, I can put it in a separate branch to test again after improvements in E.cash
Sounds good. Let's put this on HOLD for now.
Gonna take this back off hold now so we can keep discussing this...
I think that's exactly what happens when E.cash starts - a lot of operations at once
@kidroca Seems like this is happening for sure. For good measure, did some logging from the native layer and can see about 400 separate multiGet
calls in the native layer when the app inits. Saved some of the data here (though not sure it's helpful as we already know there are many native calls happening).
Feels like we have a few options to explore, but I'm not sure which ones are worth pursuing next...
- Switch over to
multiGet
instead ofgetItem
- Do the second proposal in this issue
For both of these, I'm curious how the new Onyx version of a multiGet
would utilize the cache.getPendingTask()
logic we are using here. Just looking at what kinds of keys are getting retrieved on init and it seems likely that we might try to multiGet
keys we are already in the process of getting.
AsyncStorage.getItem
would internally call the native module multiGet
method - this is the reason for the big number of separate multiGet
call that fetch a single key
But AsyncStorage.multiGet
JS implementation is actually better suited for single gets, because it uses setImmediate
to schedule a single native multiGet
request for all the individual calls that happened up until that moment
I'm surprised why that's not used internally for getItem
as well
I think we can implement both proposals - they both need a multiGet
implementation anyway
- Implement
mutliGet
and use it instead ofgetItem
(kidroca@f29cfc9) - Just use
mutliGet
when we're dealing with collection keys (kidroca@137012b)
I'm curious how the new Onyx version of a multiGet would utilize the cache.getPendingTask()
Here's how that is handled:
- We're still registering a separate pending task for each key we don't have in cache
- Then make a single multi get request for those keys
- At the end all separate pending tasks are resolved at once when the multiGet finishes
When multiGet
is called with a bunch of keys we first check if we can derive some from cache, or pending tasks, and initiate a hard read for the rest
One thing to note is there won't be much use for the capture/pending task logic if we fetch all finite items with one multiGet
at the start
There just won't be overlapping key requests
I was thinking of opening a separate ticket but here seems a good place to discuss as well
Proposal
Onyx keeps track of keys with infinite size - the "Safe Eviction Keys". The data stored under such keys can grow as long as there is storage for it.
ATM the list is solely the Report Action history
Using the "safe eviction keys" and "all keys" we can derive the "finite keys"
These are small size keys that are no larger than a kB
each
They contain frequently used information like session
, network
, currentUrl
, and general report information
The typical size these keys take is around 20-30 kB of data
The only thing that makes the size vary is the general report information, each report would add 1kB
of information
Problem
Currently the entire "finite" information is requested and read during init, it just happens in chunks as each components subscribes to different data
Solution
Identify all "finite" keys and load them into cache with one multiGet request
Then initialize the app
Additional Consideration
Since general report information (typically needed to display reports in the LHN) is stored in separate keys and each one is about 1kB
it might not be desired to fetch these in one multiGet
Perhaps these keys should be flagged as safe eviction as well e.g. what if you have 1000 such keys
Correct me if I'm wrong but I think currently we read all such keys to fill the LHN list of conversations
IMO these keys should be flagged as safe eviction as well and at some point a paging solution introduced so that we don't have to read 1000 keys, but perhaps read them incrementally as the users scrolls the LHN
I think we should create a new issue if it's OK. I have some thoughts and would like to reply, but feels like a different problem to the problem of making many unoptimized data retrieval across the bridge.
I'm surprised why that's not used internally for getItem as well
This is a good question. I'd be curious to hear what the maintainers think.
We're still registering a separate pending task for each key we don't have in cache
Oh I see, so we will still use get(key)
it just won't call AsyncStorage.getItem()
and AsyncStorage.multiGet()
handles resolving the promises to the correct places in this code?
Btw, it also looks like this code avoids fetching duplicates so perhaps the pending task is not needed
// avoid fetching duplicates
keys.forEach((key) => {
if (this._getKeys.indexOf(key) === -1) {
this._getKeys.push(key);
}
FWIW, just swapping out getItem
for multiGet
with no other changes will batch a large number of keys during init.
I logged this from flushGetRequests
while starting up the Android app:
DEBUG [AsyncStorage] Getting keys ["currentURL", "preferredLocale", "credentials", "myPersonalDetails", "currentlyViewedReportID", "user", "countryCode", "policy_010931807F672475", "policy_08D0600B8AAE868D", "policy_137DA25D273F2423", "policy_43BC6076D3D1418B", "policy_456619EAC8A1A4BF", "policy_474ABBB288DAC2E0", "policy_6B40F9444C3F2EFF", "policy_7A35FD885AE104E1", "policy_A55A7962A1F001A4", "policy_C25E8D06D27010EB", "policy_CD36E1BEF9329100", "reportIOUs_73847679", "reportIOUs_74142904", "personalDetails", "reportActions_60432478", "reportActions_61607959", "reportActions_64217852", "reportActions_64218347", "reportActions_64218390", "reportActions_64218473", "reportActions_64220748", "reportActions_64220825", "reportActions_64224572", "reportActions_64224777", "reportActions_64224791", "reportActions_64224801", "reportActions_64226902", "reportActions_64533058", "reportActions_64867940", "reportActions_64889352", "reportActions_65092250", "reportActions_65094882", "reportActions_65219439", "reportActions_65279574", "reportActions_65298677", "reportActions_65341306", "reportActions_65357746", "reportActions_65501330", "reportActions_65504418", "reportActions_65608585", "reportActions_65637646", "reportActions_65639802", "reportActions_65728247", "reportActions_65759622", "reportActions_65764596", "reportActions_65969070", "reportActions_66048205", "reportActions_66135402", "reportActions_66282008", "reportActions_66387047", "reportActions_66600097", "reportActions_66701368", "reportActions_66903522", "reportActions_67776049", "reportActions_68940003", "reportActions_69104135", "reportActions_69104809", "reportActions_69353005", "reportActions_69379828", "reportActions_70008016", "reportActions_70270997", "reportActions_70768603", "reportActions_71054527", "reportActions_71245420", "reportActions_71401209", "reportActions_71787592", "reportActions_72162014", "reportActions_72643259", "reportActions_72645341", "reportActions_74050763", "reportActions_74200583", "reportActions_74538596", "reportActions_75168433", "reportActions_75320708", "reportActions_75326778", "reportActions_75362078", "reportActions_75380790", "reportActions_75380798", "reportActions_75380810", "reportActions_75382121", "reportActions_75691589", "reportActions_75950515", "reportActions_76185608", "reportActions_76408728", "reportActions_76578590", "reportActions_77036495", "reportActions_77154182", "reportActions_77291418", "reportActions_77334248", "reportActions_77543235", "reportActions_77583619", "reportActions_77883250", "reportActions_77996837", "reportActions_78058869", "reportActions_78395461", "report_60432478", "report_61607959", "report_61717071", "report_64217661", "report_64217852", "report_64218347", "report_64218390", "report_64218473", "report_64220748", "report_64220825", "report_64224572", "report_64224777", "report_64224791", "report_64224801", "report_64226902", "report_64533058", "report_64867940", "report_64889352", "report_64981914", "report_64993910", "report_65030406", "report_65083262", "report_65092250", "report_65094882", "report_65125296", "report_65170196", "report_65171050", "report_65219439", "report_65279574", "report_65285640", "report_65298677", "report_65341306", "report_65357746", "report_65376110", "report_65501330", "report_65504418", "report_65608585", "report_65637646", "report_65639802", "report_65693396", "report_65728247", "report_65759622", "report_65764596", "report_65851328", "report_65969070", "report_66048205", "report_66112934", "report_66120017", "report_66120977", "report_66135402", "report_66282008", "report_66378905", "report_66379571", "report_66382218", "report_66382245", "report_66384585", "report_66387047", "report_66431860", "report_66546050", "report_66553044", "report_66600097", "report_66603474", "report_66674648", "report_66701368", "report_66706140", "report_66838520", "report_66867504", "report_66903522", "report_66927665", "report_67381113", "report_67776049", "report_67858265", "report_67865722", "report_67968295", "report_67977534", "report_67999703", "report_68074734", "report_68104913", "report_68231412", "report_68248275", "report_68374837", "report_68374863", "report_68374866", "report_68411449", "report_68444993", "report_68597795", "report_68600061", "report_68621407", "report_68673342", "report_68694171", "report_68696210", "report_68696216", "report_68867940", "report_68918628", "report_68940003", "report_68992666", "report_69104135", "report_69104809", "report_69171208", "report_69180191", "report_69213673", "report_69289882", "report_69335433", "report_69353005", "report_69367134", "report_69379828", "report_69396275", "report_69396438", "report_69423271", "report_69530167", "report_69564775", "report_69590301", "report_69590603", "report_69624153", "report_69865968", "report_69905551", "report_69962302", "report_69993653", "report_70008016", "report_70270997", "report_70437049", "report_70491072", "report_70515871", "report_70768603", "report_70808211", "report_71054527", "report_71088813", "report_71163286", "report_71245420", "report_71304196", "report_71304204", "report_71304364", "report_71334762", "report_71338647", "report_71357584", "report_71401209", "report_71578546", "report_71615930", "report_71764953", "report_71787592", "report_71799824", "report_71867514", "report_72059927", "report_72162014", "report_72188098", "report_72206989", "report_72426645", "report_72626744", "report_72643259", "report_72645341", "report_73026878", "report_73115712", "report_73120478", "report_73734882", "report_73734893", "report_73734922", "report_73848337", "report_74050763", "report_74200583", "report_74486476", "report_74525032", "report_74538596", "report_75168433", "report_75320708", "report_75326778", "report_75362078", "report_75380790", "report_75380798", "report_75380800", "report_75380810", "report_75382121", "report_75573325", "report_75573326", "report_75573424", "report_75573426", "report_75573427", "report_75573429", "report_75573430", "report_75573431", "report_75573432", "report_75573433", "report_75573435", "report_75573436", "report_75691589", "report_75950515", "report_76185608", "report_76408728", "report_76578590", "report_77036495", "report_77154182", "report_77291418", "report_77334248", "report_77543235", "report_77583619", "report_77869117", "report_77883250", "report_77996837", "report_78058869", "report_78395461"]
That is about 300 or so keys normally are fetched one at a time over the bridge.
I kind of expected to maybe see some big reduction in start up time since the bridge flooding issue isn't happening anymore. But things are more or less the same. Which makes me wonder how much of a problem we have with the "bridge" or if the problem is somewhere else. Here's a quick side by side comparison of release builds (w/ multiGet
is a bit faster - but not much - and also take with a grain of salt as I am only focusing on start up time which varies a bit).
getItem.v.multiGet.mp4
I think these changes are maybe still worth doing, but the lack of a difference just makes me think that we are just "pushing the limits of AsyncStorage
".
Btw, it also looks like this code avoids fetching duplicates so perhaps the pending task is not needed
It avoids duplicates but only for the current batch, while the pending tasks consider retrievals that are half way through
FWIW, just swapping out getItem for multiGet with no other changes will batch a large number of keys during init.
Yep tried that as well some time ago, the effect was noticeable but the app was slower back then
Which makes me wonder how much of a problem we have with the "bridge" or if the problem is somewhere else.
This way of making get calls still has the overhead of creating 300+ promises since calls are still happening one by one even though they are batched in the end. This will also capture 300 more promises for the capture/pending task optimisation.
So creating 600 callbacks in such a short time might be the same as flooding the bridge.
We know we have a problem with how withOnyx
works, but I don't think that alone would cause this slowdown. Looking at the huge storage operation times I thought it was the bridge, perhaps something else is keeping it busy as well?
I've suggested to strip everything from the app try put it back together to identify what might be causing this, could it be the navigation setup with all the different Navigators that mount after auth? I colleague Android developer told me once that having many view stacks hurts performance
@marcaaron
I've created this ticket: Expensify/App#4433 regarding #84 (comment)
I think this can be closed since we don't use async storage anymore?