cyrusimap/cyrus-imapd

when using squatter indexes in 3.4.3 then all not yet indexed messages are returned by body searches

Opened this issue · 5 comments

How to reproduce:

run squatter to index a mailbox

do a body search -- only matching messages are returned

add some messages to the mailbox that don't match your search

do the search again -- now matching messages plus the new messages that don't match are returned.

rsto commented

This is by design. The rationale is to prefer false positives over false negatives. If you run a rolling squatter then the amount of false positives should be negligible for any sane mail load.

No. It is a bug.

Yes, the squatter index is supposed to return false positives. but imapd is supposed to filter them out before returning the list of messages to the client.

Ref: https://www.mail-archive.com/info-cyrus@lists.andrew.cmu.edu/msg38103.html

The squatter index isn't a perfect index. What it does is given a search
term, it returns a list of messages that might contain the term, and
excludes messages that definitely do not contain the search term. For each
message that squatter says might contain the search term, cyrus then opens
the message and does a complete search on it to see if it definitely
contains the search term.

Because of that, if squatter sees a message id it hasn't indexed, it will
always return that id, because that id might contain the term, it doesn't
know.

rsto commented

Thanks, I was not aware that this is a regression. @elliefm is this something for 3.6 or later?

I agree that this sounds like a regression, and we should fix it. And we should try to fix it for 3.4 if we can (since right now that's still the current stable release).

If you can fix it for master, with a regression test, then I can look at the shape of the fix and consider whether to backport it (and if so: how far back to take it). There's already a (very small) SearchSquat.pm that the test(s) could go in.

In the past I might have planned to fix it directly on 3.4 myself, and then forward port that. Except I don't know much about search, and I don't know how much it has diverged since 3.4 either, so I would probably end bringing you in on it anyway. And that being the case, it's probably better use of our time and respective expertise for you to fix it on master, and me to deal with the older branches.

Just to let you know I already opened this bug some time ago, but it was never addressed:

#3901

We're still hanging on 2.x because it's not possible to have it like this on production.
I tried to look at the code and replicate what 2.x did to solve this, but it seems very hard to port because the 3.x implements the filtering in a completely different way.