josiahcarlson/rom

Question/propose for NOT IN/not startswith

bgervan opened this issue · 3 comments

Hi,
I ran trhough the code and I didn't find feature for negate a filter.
It would be great if there would be a workaround for that, like not startswith or something similar.

Resource:
https://stackoverflow.com/questions/34174614/how-to-write-not-in-redis-query

Is there a way to that?

In order for Rom to do it's work, it uses (generally) set intersections and unions. While there is also an "sdiffstore", you would have to construct a set of things that explicitly included the things you wanted to exclude, then you would take your "all set" and subtract those values (if there was such a set in the first place).

The link you provide is directly regarding Redis, which also does not include the ability to filter arbitrary keys in Redis for things that do not match a pattern. You get pattern matching, not exclusion.

Regarding your specific example; sure. We can write something that doesn't match a specific prefix and iterates over chunks of Redis entities. Since most everything is prefixed by score anyway, we can definitely exclude huge swaths of the world that don't start with the same stuff. However, if you need to support prefixes > 7 bytes, you basically need to pull down all the values and check their prefixes.

So... something like this will get you started for now.

Since I've written it, I'll probably add it somewhere, but it's probably not going to be integrated with the rest of the query stuff in a typical fashion.

from rom.util import session, _prefix_score

def zrange_limit_iterator(conn, key, start, end, count=100):
    start = 0
    lc = count
    while lc == count:
        chunk = c.zrangebyscore(idx, start, end, "LIMIT", start, count)
        yield chunk
        lc = len(chunk)
        start += lc

def does_not_startwith(model, colname, values, chunksize=100):
    sc = _prefix_score(value)
    idx = model._namespace + ":" + colname + ":pre"
    c = model._connection

    exclude = {}
    for v in values:
        psv = _prefix_score(v)
        if psv not in exclude:
            exclude[psv] = set()
        if isinstance(v, str):
            v = v.encode("utf-8")
        exclude[psv].add(v)

    excludes = sorted(exclude, reverse=True)

    last = 0
    while excludes:
        exc = excludes.pop()
        # things that are between the matched items
        for chunk in zrange_limit_iterator(c, idx, last, "(" + str(exc), chunksize):
            ids = set(int(p.rpartition(b"\0")[-1]) for p in chunk)
            if ids:
                found = model.get(list(ids))
                if found:
                    yield from found
                    session.forget(found)


        # things that match the prefix score, but which don't match the prefix
        # values provided...
        m = excludes.pop(exc)
        max_length = max(map(len, m))
        if max_length > 7:
            for chunk in zrange_limit_iterator(c, idx, exc, "(" + str(exc+1), chunksize):
                ids = set()
                for p in for v in chunk:
                    for v in m:
                        pre, _, _id = p.rpartition(b"\0")
                        if pre.startswith(v):
                            break
                    else:
                        ids.add(int(id))

                if ids:
                    found = model.get(list(ids))
                    if found:
                        yield from found
                        session.forget(found)

Okay, the above doesn't quite work, obviously. I've got a version for does_not_startwith() and does_not_endwith() with tests that I'm about to release as part of version 1.0.0 . I'll ping this thread when it's released.

This is now in rom 1.0.0, now released.