Text Column strange behavior on order_by and filter operations

Question

Text Column strange behavior on order_by and filter operations

healiseu opened this issue 6 years ago · 3 comments

Answer 1 · 2018-08-09T17:38:58.000Z

OK the difference between the SIMPLE/SIMPLE_CI and the IDENTITY/IDENTITY_CI is indeed well documented at the columns Module but it escaped my attention. Nevertheless I had to look under the hood where Redis creates its structures to understand that IDENTITY* is based on Redis SET and SIMPLE* is based on Redis ZSET and therefore you use the first for equality matching ONLY and the second for ordering ONLY. But read the update below, if you combine a prefix index with SIMPLE* keygen index the column can be used for filtering too with the like operator.

Now If I understand the logic on how to create indexes the case I have can be covered by having something like:

txt_identity_sorting = Text(index=True, keygen=IDENTITY, keygen2=????)

I do not want to duplicate my data, I suppose one solution is to use an additional zsort index on the enumerated type like the one defined in the SIMPLE keygen. What's the best solution for this, do I use a keygen2 or do I define an IndexOnly column. I am also a bit confused on the difference between keygen2 and IndexOnly. There are only two examples in your unit test file.

Finally, the question that arises in this simple demo case I described is whether you use a single lexicographic index, or perhaps bitmap index, that covers both filtering and sorting operations ? But maybe this is out of the scope for ROM.

Wish you a nice relaxed time on your vacations...

Answer 2 · 2018-08-09T22:13:32.000Z

I fully understand that trying to cover many generic query cases in ROM and at the same time maintaining a simple to use command line interface is not an easy task. I am solving a similar problem in TRIADB. Sorting and filtering are distinct operations and have to be treated separately so that they can be combined. But although this is true for the API logic, behind the scenes a memory/speed optimization can dictate a different approach, e.g. bitmap indexing.

Answer 3 · 2018-08-09T23:53:11.000Z

Update:

class TestText(Model):
    txt_simple_prefix = Text(index=True, keygen=SIMPLE, prefix=True

I added another Text column with SIMPLE keygen and the prefix index and then tested with the like operator:

TestText.query.like(txt_simple_prefix='BROWN').all()

And this also returns an empty result set.

But if I use a pattern e.g. ?BROWN or ?BR

TestText.query.select('txt_simple_prefix', ff=_tuple_data_factory).like(txt_simple_prefix='?BROWN').all()

Out[16]: [('BROWN',), ('BROWN',), ('BROWN',)]

TestText.query.select('txt_simple_prefix', ff=_tuple_data_factory).like(txt_simple_prefix='?BR').order_by('txt_simple_prefix').all()

Out[13]: [('BRONZE',), ('BROWN',), ('BROWN',), ('BROWN',)]

It seems there is no need to define a secondary IndexOnly column, is that right ?