leeoniya/uFuzzy

Single Error mode only allowing exact matches under 5 characters

ascendedguard opened this issue · 3 comments

Common use case I'm running into: Searching a list of names with "Nicholas", however searching for "Nick" does not work even though it's only 1 character off. Searching for "Nicko" however will find it. intraMode is set to 1 (SingleError)

You can see this on the demo by searching "Niko Eymerich":

image

While adding a 5th letter with "Nikol Eymerich" will show the correct results:

image

Using latest version 1.0.14

yeah, by default there are stricter requirements for short terms, because you generally dont want "oct" to match "cot" and "ot" substrings.

you can modify these defaults on a per-term basis using the intraRules option, (the defaults are linked below)

more discussion in: #39, #44

i'm certainly open to improving the defaults when it makes sense. for example, if you only want to do prefix matching for each term (interLft: 1 or interLft: 2) rather than arbitrary substrings, we can relax the rules for short terms. but as i've said in the other issues, i'm quite hesitant to introduce additional custom short-term behavior that must be understood under different options combos.

uFuzzy/src/uFuzzy.js

Lines 156 to 198 in ec1c44b

intraRules = p => {
// default is exact term matches only
let _intraSlice = OPTS.intraSlice, // requires first char
_intraIns = 0,
_intraSub = 0,
_intraTrn = 0,
_intraDel = 0;
// only-digits strings should match exactly, else special rules for short strings
if (/[^\d]/.test(p)) {
let plen = p.length;
// prevent junk matches by requiring stricter rules for short terms
if (plen <= 4) {
if (plen >= 3) {
// one swap in non-first char when 3-4 chars
_intraTrn = Math.min(intraTrn, 1);
// or one insertion when 4 chars
if (plen == 4)
_intraIns = Math.min(intraIns, 1);
}
// else exact match when 1-2 chars
}
// use supplied opts
else {
_intraSlice = intraSlice;
_intraIns = intraIns,
_intraSub = intraSub,
_intraTrn = intraTrn,
_intraDel = intraDel;
}
}
return {
intraSlice: _intraSlice,
intraIns: _intraIns,
intraSub: _intraSub,
intraTrn: _intraTrn,
intraDel: _intraDel,
};
};
}

Thanks, this was what I needed.

This is basically what I ended up using, with some special shenanigans so that the method return played nice in Typescript.

// custom uFuzzy rules for short strings, assuming intraMode = 1
const uFuzzyIntraRules = (p: string) => { 
    // note: this is assuming intraMode = 1
    const settings: {
        intraSlice: IntraSliceIdxs;
        intraIns: 0 | 1;
        intraSub: 0 | 1;
        intraTrn: 0 | 1;
        intraDel: 0 | 1;
    } = {
        intraSlice: [1, Infinity],
        intraIns: 1,
        intraSub: 1,
        intraTrn: 1,
        intraDel: 1,
    }

    // only-digits strings should match exactly, else special rules for short strings 
    if (/[^\d]/.test(p)) { 
        const plen = p.length;

        // some more flexible searching for 3/4 letter terms than what default allowed
        if (plen === 3 || plen === 4) {
            settings.intraSlice = [2, Infinity];
            settings.intraTrn = 0;
            settings.intraIns = 0;
            settings.intraDel = 0;
        } 
    }

    return settings;
}

const opts: uFuzzy.Options = {
    intraMode: 1, // Single errors allowed,
    intraRules: uFuzzyIntraRules,
    interLft: 2, // Loose left word boundary
};
const fuzzySearch = new uFuzzy(opts);

I may do a PR soon just with some minor documentation suggestions from figuring out the library over the last day. Thanks for the library!

👍