`info` fails after multiple filters
Closed this issue · 6 comments
Hi !
First of all, thank you so much, your library is amazing !
But I encountered an issue when trying to do a slightly custom search as the search
wasn't behaving as I desired.
What I was trying to do is do a filter
for each words in a needle
and reusing returned indexes for each iteration (quite similar to directly using search
but not identically the same).
The problem is when you have a
needle
:j n
and a haystack:
[ "bob", "flower John blue", // this breaks `info` since only one word (John) has a `j` and an `n` "John John John", // is okay but a `j j j j` needle would then break this line also ]
The filters will give the right index: 1
But
info
cannot internally manage having a split needle pointing to the same word.
I just realized that I was totally wrong !
even the needle
John flower
breaksinfo
as it's searching forflower
afterjohn
.
Actually wrong again, it's quite confusing ! I'll update the issue once I'll 100% understand the problem
Even a f j
needle breaks info
which I really don't understand why
Here's the code if it can help but I'm really not doing anything crazy:
const max = 1000
let indexes: HaystackIdxs | undefined = undefined
const sortedWords = needle
.split(' ')
.filter((word) => word.length)
.sort((a, b) => b.length - a.length)
for (let wordIndex = 0; wordIndex < sortedWords.length; wordIndex++) {
if (indexes?.length === 0) break
try {
indexes = fuzzy.filter(haystack, sortedWords[wordIndex], indexes) || []
} catch (error) {
console.error(error)
}
}
indexes = indexes?.slice(0, max)
// Indexes are perfect, `filter`s work as expected
if (indexes) {
const info = fuzzy.info(indexes, haystack, needle)
const order = fuzzy.sort(info, haystack, needle)
for (let i = 0; i < order.length; i++) {
let infoIdx = order[i]
const [title, path] = uFuzzy
.highlight(haystack[info.idx[infoIdx]], info.ranges[infoIdx])
.split('\t')
list.push({
preset: presets.list[info.idx[infoIdx]],
path,
title,
})
}
}
let's rewind a bit here and first make sure you actually need a custom strategy :)
without code, can you provide the haystack, the needle, and what you expect to get back, and what you're actually getting back?
Sure ! Sorry, it got a bit messy 😅
So I have a tree of things in my app for easy discoverability. Very similar to a file tree. Each thing has a path
, a name
and a title
.
I wanted to implement a search to make it easier to find what you know you are looking for without having to open all the "folders".
So the haystack is: tree.flatten().map((thing) => `${thing.title}\t{thing.path}\`)
Example:
[
"Temporary\t/tmp",
"Temporary file\t/tmp/f",
]
I expect the needle tmp temp
to return both lines which works with search
only with the outOfOrder
option set but then the highlighting doesn't work well. The needle Temporary file
doesn't highlight Temporary file
which was really surprising to me. And I may have had some other issue but now I'm not sure if it was me, my code, some misunderstanding from my part or really the library (I tried many things haha !)
So then I tried the example with the manual filter/info/order
and with that Temporary file
is highlighted as expected but I did want the out of order functionally and so I looked at how search
was implemented and "took" the idea to just run filter multiple times and so tmp temp
worked now too (and with the highlighting !)
And so I tried several things which broke info
. Between the posting of the issue and this comment I realized that I could just use the order without the highlighting when I must which is much less inconvenient... a brain fart from my part. Sorry !
But there are still things that break info
properly with several filters for example:
For
haystack: ["apple orange orange"]
needle: apple ora
I expect to have apple
and both ora
to be highlighted (or even just the first ora
)
For
haystack: ["apple orange"]
needle: a o
I expect to have both a
and o
to be highlighted (since when searching only for a
, the a
is indeed highlighted) (and maybe the a
in orange but that one much more ambiguous and up for debate)
For
haystack: ["......... / apple / orange / banana / lime / .........."]
needle: orange lime apple banana
I expect to have each word highlighted
The needle
Temporary file
doesn't highlightTemporary file
which was really surprising to me.
is it possible that you had > 1000 results for Temporary file
, so it never crossed the default infoThresh
setting and bailed after the filter step?
you can see that there is no ranking or ordering until you have <= 1000 results:
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=su
vs
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=sup
if you change infoThresh
to 10k, you'll see it rank and sort with higher match counts. imo it's generally a waste of cpu to always do this, but use cases vary, obviously.
For this once case it's quite possible but I can find you many cases when I expect highlighting to work with < 1000 results when it doesn't. It's okay though as correct results do appear ! It's just quite disturbing at first to have to highlight and then it just disappearing because you added a letter
I expect the needle
tmp temp
to return both lines which works withsearch
only with theoutOfOrder
option set but then the highlighting doesn't work well. The needleTemporary file
doesn't highlightTemporary file
which was really surprising to me.
here is your haystack file drag-dropped into the demo, and working as expected.
https://leeoniya.github.io/uFuzzy/demos/compare?libs=uFuzzy&outOfOrder&search=tmp%20temp
Thanks a lot for your help after playing a lot with the website I found exactly what I needed for my use case:
First pass:
- intraIns: Inf
- intraMode: MultiInsert
- outOfOrder: true
If found nothing then same thing with intraMode: SingleError