leeoniya/uFuzzy

`info` fails after multiple filters

Closed this issue · 6 comments

Hi !

First of all, thank you so much, your library is amazing !

But I encountered an issue when trying to do a slightly custom search as the search wasn't behaving as I desired.

What I was trying to do is do a filter for each words in a needle and reusing returned indexes for each iteration (quite similar to directly using search but not identically the same).

The problem is when you have a needle: j n

and a haystack:

[
"bob",
"flower John blue", // this breaks `info` since only one word (John) has a `j` and an `n`
"John John John", // is okay but a `j j j j` needle would then break this line also
]

The filters will give the right index: 1

But info cannot internally manage having a split needle pointing to the same word.

I just realized that I was totally wrong !

even the needle John flower breaks info as it's searching for flower after john.

Actually wrong again, it's quite confusing ! I'll update the issue once I'll 100% understand the problem

Even a f j needle breaks info which I really don't understand why


Here's the code if it can help but I'm really not doing anything crazy:

    const max = 1000

    let indexes: HaystackIdxs | undefined = undefined

    const sortedWords = needle
      .split(' ')
      .filter((word) => word.length)
      .sort((a, b) => b.length - a.length)

    for (let wordIndex = 0; wordIndex < sortedWords.length; wordIndex++) {
      if (indexes?.length === 0) break

      try {
        indexes = fuzzy.filter(haystack, sortedWords[wordIndex], indexes) || []
      } catch (error) {
        console.error(error)
      }
    }

    indexes = indexes?.slice(0, max)

    // Indexes are perfect, `filter`s work as expected

    if (indexes) {
      const info = fuzzy.info(indexes, haystack, needle)
      
      const order = fuzzy.sort(info, haystack, needle)

      for (let i = 0; i < order.length; i++) {
        let infoIdx = order[i]

        const [title, path] = uFuzzy
          .highlight(haystack[info.idx[infoIdx]], info.ranges[infoIdx])
          .split('\t')

        list.push({
          preset: presets.list[info.idx[infoIdx]],
          path,
          title,
        })
      }
    }

let's rewind a bit here and first make sure you actually need a custom strategy :)

without code, can you provide the haystack, the needle, and what you expect to get back, and what you're actually getting back?

Sure ! Sorry, it got a bit messy 😅

So I have a tree of things in my app for easy discoverability. Very similar to a file tree. Each thing has a path, a name and a title.
I wanted to implement a search to make it easier to find what you know you are looking for without having to open all the "folders".

So the haystack is: tree.flatten().map((thing) => `${thing.title}\t{thing.path}\`)
Example:

[
"Temporary\t/tmp",
"Temporary file\t/tmp/f",
]

I expect the needle tmp temp to return both lines which works with search only with the outOfOrder option set but then the highlighting doesn't work well. The needle Temporary file doesn't highlight Temporary file which was really surprising to me. And I may have had some other issue but now I'm not sure if it was me, my code, some misunderstanding from my part or really the library (I tried many things haha !)

So then I tried the example with the manual filter/info/order and with that Temporary file is highlighted as expected but I did want the out of order functionally and so I looked at how search was implemented and "took" the idea to just run filter multiple times and so tmp temp worked now too (and with the highlighting !)

And so I tried several things which broke info. Between the posting of the issue and this comment I realized that I could just use the order without the highlighting when I must which is much less inconvenient... a brain fart from my part. Sorry !

But there are still things that break info properly with several filters for example:

For
haystack: ["apple orange orange"]
needle: apple ora
I expect to have apple and both ora to be highlighted (or even just the first ora)

For
haystack: ["apple orange"]
needle: a o
I expect to have both a and o to be highlighted (since when searching only for a, the a is indeed highlighted) (and maybe the a in orange but that one much more ambiguous and up for debate)

For
haystack: ["......... / apple / orange / banana / lime / .........."]
needle: orange lime apple banana
I expect to have each word highlighted

The needle Temporary file doesn't highlight Temporary file which was really surprising to me.

is it possible that you had > 1000 results for Temporary file, so it never crossed the default infoThresh setting and bailed after the filter step?

you can see that there is no ranking or ordering until you have <= 1000 results:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=su
vs
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=sup

if you change infoThresh to 10k, you'll see it rank and sort with higher match counts. imo it's generally a waste of cpu to always do this, but use cases vary, obviously.

For this once case it's quite possible but I can find you many cases when I expect highlighting to work with < 1000 results when it doesn't. It's okay though as correct results do appear ! It's just quite disturbing at first to have to highlight and then it just disappearing because you added a letter

I expect the needle tmp temp to return both lines which works with search only with the outOfOrder option set but then the highlighting doesn't work well. The needle Temporary file doesn't highlight Temporary file which was really surprising to me.

here is your haystack file drag-dropped into the demo, and working as expected.

test-list.json

https://leeoniya.github.io/uFuzzy/demos/compare?libs=uFuzzy&outOfOrder&search=tmp%20temp

image image

Thanks a lot for your help after playing a lot with the website I found exactly what I needed for my use case:

First pass:

  • intraIns: Inf
  • intraMode: MultiInsert
  • outOfOrder: true

If found nothing then same thing with intraMode: SingleError