Filter caching based on prefix

I mentioned in another issue that orderless flex is too slow for me. It's especially slow in things like M-x that have a significant number of results.

I don't know how feasible it would be, but I wonder about maintaining the previous results and match string and if the previous match string is the prefix of the current match string, start with those results and filter them, rather than starting with the entire set.

For example, if I type to, the filtered results would be cached, then when I added another g, making the search string tog, the previous results that were cached would be fed into the filter, rather than the initial set of all commands.

Backspacing would still be slow, though the caching mechanism could be elaborated to be a stack so the first several backspaces would be instant.

Yeah, orderless-flex as well as the built-in flex actually stalls out for me for certain completions. I don't necessarily think the first completion (empty completion) is the longest time though. Filtering out a smaller collection with a long flex pattern abcde etc etc takes exponentially longer with each letter. (I've seen upwards of 20 seconds to come back from filtering with flex/orderless-flex).

I use (setq fussy-filter-fn 'fussy-filter-fast) which is not as exhaustive but works well enough for me.

As for caching, I'd have to think about it a little more, as mentioned slow parts of the filtering may appear later in the query.

Not super sure though, maybe matches are exhaustive if we 1. cache the initial empty string set and then score across the set. I'd encourage you to look at it a little further if it interests you. I may take a look at myself in the future.

fussy-filter-fast is similarly slow for me. It's not usable for corfu completion in elisp, for example.

I agree and have also observed that longer queries are slower, but as I understand it, those longer queries run on the entire set. If they only ran on the previous query's result set, it can't but be faster. Slow match time * 5 is going to be 10 times faster than Slow match time * 50.

I may try something out at some point, but right now I have no idea how the completion table stuff works so that's my biggest hurdle.

fussy-filter-fast is similarly slow for me. It's not usable for corfu completion in elisp, for example.

Definitely true...

If they only ran on the previous query's result set, it can't but be faster. Slow match time * 5 is going to be 10 times faster than Slow match time * 50.

Yeah, I'd have to think harder if we're missing any matches that way is all, If we're 'exhaustive', then we should definitely do this.

I may try something out at some point, but right now I have no idea how the completion table stuff works so that's my biggest hurdle.

It's definitely confusing...

Hotfuzz has a smaller codebase and could be used to acquaint yourself a little with what the all-completions interface expects:

https://github.com/axelf4/hotfuzz/blob/a19395aca9eff0e31c51efbbe4c6e16229f3b380/hotfuzz.el#L169

all-completions is the entry point:

fussy/fussy.el

Line 424 in 8e32eb2

(defun fussy-all-completions (string table pred point)
The filtering is done here:

fussy/fussy.el

Line 463 in 8e32eb2

(funcall fussy-filter-fn

It's a little tedious to walk the code because it's doing a 'funcall instead of calling a function directly but following the any of the fussy-filter-fn s there will work for grokking the code. Think you might even be able to advise one of the fussy-filter-fn s here and build the cache here. (Afterall, you'll have the list of candidates returned as well as the pattern/query which is probably all you need for the cache)
Filtering is done, we now proceed to scoring:

fussy/fussy.el

Line 477 in 8e32eb2

(fussy-score all infix cache))
Scoring scores using flx/fzf/fuz/etc and then sets 'completion-score. Sorting happens later (in the next step)
'completion--adjust-metadata actually sorts the candidates by 'completion-score text property
Highlighting can be ignored for now.

start with those results and filter them, rather than starting with the entire set.

I wonder if we can use a hashtable instead and store prefix -> result. If you backspace, for example 'a' -> 'ab' - 'a', presumably, you'd already have results for 'a' and 'ab', considering you already did the work of processing 'a' and 'ab' given a completion table.

Thinking more for the completing-read case though and not the completion-in-region case where there may be more to consider when caching.

Yeah, it's just memory at that point. Is there a way to know when the "completion session" is done so we can clear the table? We don't want to reuse between different completing-reads, for example.

Is there a way to know when the "completion session" is done so we can clear the table? We don't want to reuse between different completing-reads, for example.

Not sure, at least for completing-read though, we could probably use a buffer-local. I believe one completion session == 1 minibuffer and that it gets called/created everytime for one 'session'. Could be wrong though, we'd have to verify those assumptions.

With completion-in-region, maybe company/vertico have hooks for that.

With the forward case (e.g. 'a' -> 'ab' -> 'abc'), maybe it's sufficient to do something like, get current user entered query, e.g. user has entered up to 'abc', we check if 'abc' is in our result cache, use that. If it isn't we can check the 'abc'-minus 1 string instead which is 'ab' in this case and use that as the 'filtered portion' of the collection.

IOW:

current query: 'abc'

if 'abc' in table -> (past result) -> use cache result as all-completions result, skip filtering and scoring
else if 'ab' in table -> (new query) -> use cache result as the filtered candidates, skip filtering -> score

Not really sure if there are holes anywhere in this line of thinking though. Guarantee there'd be bugs ironing this out...

Can try this later, but is it possible the input table is constant throughout a session such that we could do object identity equality on it?

Not sure about that.