Tencent/wwsearch

Have any benchmarks been made over this library?

Closed this issue · 0 comments

I notice that for each term within any documents to be inserted, all posting list will be updated corresponding to the terms' suffix strings, it's a huge amount of IO operation for the inverted table.
For example:
[key, value] where value is a compressed form of doc id list.
Given huge amount of documents, the value would be very huge, say contain tens of millions of ids.
Suppose each document contains an average of M terms, each term has an average length of L, then for each document, it will need M*L key-value Update operations, and the value might be very huge. Although there exist write batch mechanism in RocksDB to alleviate the IO burden, it's still an expensive mechanism.

Have you made any benchmarks over huge corpus? I wonder how it could perform on both insert and query? Thank you~