leeoniya/uFuzzy

SingleError mode should treat numbers specially

leeoniya opened this issue · 1 comments

it's probably not reasonable when searching for abc1236 to match abc136 or ab1c236 or abc123.

i guess my premise here is that spelling mistakes are different than typos. if you type gorila instead of gorilla you probably wont get any confusing matches for gorila. but if you type 1123, you most likely dont want to see a match for 123 or 1132, but these records are quite likely to exist while being completely irrelevent, and become noise.

i think the most straightforward way to improve here is to match numeric portions of the needle exactly as substrings and not allow mistakes in them. the can be done in the needle prep step by segmenting the alpha and numeric portions of terms and quote-enclose the numbers.

abc123 becomes abc "123" the issue with turning this into two terms is that they can now be matched out of order if that setting is enabled. maybe not a big deal.

we can instead add support for exact substrings within terms which would turn the needle abc123 into abc"123", retaining a single term but requiring an exact 123 suffix here and preventing the SingleError behavior from crossing an alpha-num boundary. this seems like the better approach.

ended up just disabling fuzziness for all runs of numbers in SingleError mode. no new options.