Input Method Data Version Optimization

Question

Input Method Data Version Optimization

Opened this issue 3 years ago · 3 comments

Currently the Input Method Data and Data Version are separate from each other, which could introduce fragility. One potential solution would be to use a hash of the Data as that data's Version. MurmurHash3 may be a good algorithm for this. This is not a vital optimization!

Answer 1 · 2021-12-28T14:31:29.000Z

I think we separated version from the data file so that we wouldn't have to download the whole file every time. How large would a hash as a separate file be?

I do think going the hash route would be better; we don't really care about the version, we just want to know if the file needs to be updated. Not having to maintain a version number would make it simpler- just update the hash without the need to maintain continuity between versions.

Answer 2 · 2021-12-28T14:38:14.000Z

I absolutely agree. It would be a tradeoff. Instead of only handling a single character, I believe that the smallest average size of many hash functions (including MurmurHash3) is 32 bits, or 4 bytes.

Basically, we are gaining robustness as the cost of needing to periodically calculate a hash, and storing a few more bytes (which shouldn't be a huge deal).

Answer 3 · 2021-12-28T14:39:00.000Z

Let's do it!