FoundationDB/fdb-record-layer

In-unions on aggregate indexes can be planned with insufficient comparison keys in the presence of repeated fields

alecgrieser opened this issue · 0 comments

There are currently cases in the planner where in-union plans over aggregate indexes are being generated with incorrect comparison keys because they do not include enough information in them to avoid de-duping unique groups. This ultimately stems from the way that the aggregate index handles computing the ordering key. Namely, if there is a repeated field, then the ordering key can end up ignoring fields in the index after the repeated field. I believe the intention of that code was not to say too much about the order in the case of repeated fields because of subtle changes to the cardinality that can occur within value indexes, though I don't believe that that should actually apply to aggregate indexes. The upshot of this behavior is that the final comparison key may not contain enough information to distinguish between different groups, which in turn means that the query may incorrectly de-dupe elements.

By way of example, if you had an index like:

new Index(indexName, field("a").groupBy(field("b", FanType.FanOut), field("c")), IndexTypes.PERMUTED_MAX, Map.of(IndexOptions.PERMUTED_SIZE, "1"))

Where elements are ordered by b (fanned-out), then max(a), then c, if you query the index with a query approximating

SELECT r.x, T.c, max(T.a) as m
    FROM T, (SELECT _0 AS x FROM T.b) r
    GROUP BY r.x, T.c
    HAVING r.x IN ?xList
    ORDER BY max(T.a)

Then that can end up being planned as an in-union over the index, but the only value in the comparison key is max(T.a) so any duplicate values for the maximum will be filtered out, which is incorrect.