Changing search algorithm to better suit numbers
Closed this issue · 2 comments
I'm mainly searching across strings of numbers, rather than English words, and it seems like the tf-idf algorithm isn't really suited to this. For example, if there is an entry with id = 123456
and I search for 3456
then it doesn't show up as a result, despite there being no other document with a 3456
string in it.
How would one go about changing the search algorithm to something else?
You could write a custom Lunr tokenizer. However, in general, I'd say this is a really hard problem to do what you describe, because there's a combinatorial explosion:
6
56
456
3456
23456
123456
^ each of these has to be indexed, and you potentially have to do a prefix search on each one as well.
If I were you, I would probably write a map/reduce query (https://pouchdb.com/guides/queries.html) to do this (it supports prefix searching built-in). Although you may find that the performance is very very bad because of all the substrings you need to index.
Closing as it is out-of-scope for the current project.