xiaoyifang/goldendict-ng

Index implementation rethinking

Opened this issue · 6 comments

Is your feature request related to a problem? Please describe.
The Goldendict use custom btree implementation to store the index, it serves the purpose well.

Describe the solution you'd like
As we already use xapian for fulltext engine, will use xapian to replace the custom btree implementation is possible and necessary?

I'll just leave the issue here for further consideration.

some drawbacks:

  1. 0 sensitive in word
  2. custom tokenize when process phrase such as a lot of
  3. performance.the more the word,the slower
  4. unsorted

Possible solution
without the adaption of xapian,I would highly try rocksdb.

Does parallel-hashmap provide a way to save the structure to file ?

greg7mdp/parallel-hashmap#146
last time I use the parallel structure,I got a error .

Does parallel-hashmap provide a way to save the structure to file ?

Yes.

Dump/load feature: when a flat hash map stores data that is std::trivially_copyable, the table can be dumped to disk and restored as a single array, very efficiently, and without requiring any hash computation. This is typically about 10 times faster than doing element-wise serialization to disk, but it will use 10% to 60% extra disk space. See examples/serialize.cc. (flat hash map/set only)

Or use SQLite or another suitable key-value DB.

rocksdb