Question/propose for NOT IN/not startswith
bgervan opened this issue · 3 comments
Hi,
I ran trhough the code and I didn't find feature for negate a filter.
It would be great if there would be a workaround for that, like not startswith or something similar.
Resource:
https://stackoverflow.com/questions/34174614/how-to-write-not-in-redis-query
Is there a way to that?
In order for Rom to do it's work, it uses (generally) set intersections and unions. While there is also an "sdiffstore", you would have to construct a set of things that explicitly included the things you wanted to exclude, then you would take your "all set" and subtract those values (if there was such a set in the first place).
The link you provide is directly regarding Redis, which also does not include the ability to filter arbitrary keys in Redis for things that do not match a pattern. You get pattern matching, not exclusion.
Regarding your specific example; sure. We can write something that doesn't match a specific prefix and iterates over chunks of Redis entities. Since most everything is prefixed by score anyway, we can definitely exclude huge swaths of the world that don't start with the same stuff. However, if you need to support prefixes > 7 bytes, you basically need to pull down all the values and check their prefixes.
So... something like this will get you started for now.
Since I've written it, I'll probably add it somewhere, but it's probably not going to be integrated with the rest of the query stuff in a typical fashion.
from rom.util import session, _prefix_score
def zrange_limit_iterator(conn, key, start, end, count=100):
start = 0
lc = count
while lc == count:
chunk = c.zrangebyscore(idx, start, end, "LIMIT", start, count)
yield chunk
lc = len(chunk)
start += lc
def does_not_startwith(model, colname, values, chunksize=100):
sc = _prefix_score(value)
idx = model._namespace + ":" + colname + ":pre"
c = model._connection
exclude = {}
for v in values:
psv = _prefix_score(v)
if psv not in exclude:
exclude[psv] = set()
if isinstance(v, str):
v = v.encode("utf-8")
exclude[psv].add(v)
excludes = sorted(exclude, reverse=True)
last = 0
while excludes:
exc = excludes.pop()
# things that are between the matched items
for chunk in zrange_limit_iterator(c, idx, last, "(" + str(exc), chunksize):
ids = set(int(p.rpartition(b"\0")[-1]) for p in chunk)
if ids:
found = model.get(list(ids))
if found:
yield from found
session.forget(found)
# things that match the prefix score, but which don't match the prefix
# values provided...
m = excludes.pop(exc)
max_length = max(map(len, m))
if max_length > 7:
for chunk in zrange_limit_iterator(c, idx, exc, "(" + str(exc+1), chunksize):
ids = set()
for p in for v in chunk:
for v in m:
pre, _, _id = p.rpartition(b"\0")
if pre.startswith(v):
break
else:
ids.add(int(id))
if ids:
found = model.get(list(ids))
if found:
yield from found
session.forget(found)
Okay, the above doesn't quite work, obviously. I've got a version for does_not_startwith() and does_not_endwith() with tests that I'm about to release as part of version 1.0.0 . I'll ping this thread when it's released.
This is now in rom 1.0.0, now released.