pouchdb-community/pouchdb-quick-search

Changing search algorithm to better suit numbers

Closed this issue · 2 comments

I'm mainly searching across strings of numbers, rather than English words, and it seems like the tf-idf algorithm isn't really suited to this. For example, if there is an entry with id = 123456 and I search for 3456 then it doesn't show up as a result, despite there being no other document with a 3456 string in it.

How would one go about changing the search algorithm to something else?

You could write a custom Lunr tokenizer. However, in general, I'd say this is a really hard problem to do what you describe, because there's a combinatorial explosion:

6
56
456
3456
23456
123456

^ each of these has to be indexed, and you potentially have to do a prefix search on each one as well.

If I were you, I would probably write a map/reduce query (https://pouchdb.com/guides/queries.html) to do this (it supports prefix searching built-in). Although you may find that the performance is very very bad because of all the substrings you need to index.

Closing as it is out-of-scope for the current project.