hypertrace/document-store

Add support for NOT IN filter in document store

avinashkolluru opened this issue · 4 comments

We currently support the following operators in the filter

    AND,
    OR,
    EQ,
    NEQ,
    GT,
    LT,
    GTE,
    LTE,
    IN,
    CONTAINS,
    EXISTS,
    NOT_EXISTS,
    LIKE

We need to extend this to support NOT_IN

Is it NOT_IN opposite to IN?

On the mongo side, the operator - in and nin can support both array and non-array fields.
e.g

db.inventory.insertMany([
   { item: "journal", qty: 25, tags: ["blank", "red"], size: { h: 14, w: 21, uom: "cm" } },
   { item: "mat", qty: 85, tags: ["gray"], size: { h: 27.9, w: 35.5, uom: "cm" } },
   { item: "mousepad", qty: 25, tags: ["gel", "blue"], size: { h: 19, w: 22.85, uom: "cm" } }
])

Below are valid queries:

db.inventory.find( { 'tags' : { $nin: [ "gel", "blank" ] } } )
db.inventory.find( { 'qty' : { $nin: [ 10, 25 ] } } )

However, that is not the case with Postgres. Currently, we have only support for non-array field type for the in operator.

CREATE TABLE mytestWithArray (id TEXT PRIMARY KEY,document jsonb NOT NULL,created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW())
INSERT INTO mytestWithArray (id,document) VALUES( 'default:testKey1', '{ "_id": "default:testKey1", "id": "testKey1", "results": [ { "product": "abc", "score": 10 },{ "product": "xyz", "score": 5 } ] }'::jsonb)
INSERT INTO mytestWithArray (id,document) VALUES( 'default:testKey2', '{ "_id": "default:testKey2", "id": "testKey2", "results": [ { "product": "abc", "score": 8 },{ "product": "xyz", "score": 7 } ] }' :: jsonb) 
INSERT INTO mytestWithArray (id,document) VALUES( 'default:testKey3', '{ "_id": "default:testKey3", "id": "testKey3", "results": [ { "product": "abc", "score": 7 },{ "product": "xyz", "score": 8 } ] }' :: jsonb) 
INSERT INTO mytestWithArray (id,document) VALUES( 'default:testKey4', '{ "_id": "default:testKey4", "id": "testKey4", "results": [ { "product": "abc", "score": 7 },{ "product": "def", "score": 8 } ] }' :: jsonb) 

currently valid query (for non-array field):

SELECT * FROM mytestwitharray WHERE (document->>'id') IN ('testKey1', 'testKey2')

For, array-field, values should be an exact match.

To support, both array and non-array fields, we will need a filter like below as there is no direct support in Postgres.
ex. the query for non-array field:

SELECT * FROM mytestwitharray WHERE ((jsonb_typeof(document->'id')='array' AND (document->>'id')::jsonb @> ANY ('{"[\"testKey1\"]"}'::jsonb[])) OR (jsonb_typeof(document->'id')!='array' AND json_build_array(document->>'id')::jsonb @> ANY ('{"[\"testKey1\"]"}'::jsonb[])))

e.g the query for array field:

SELECT * FROM mytestwitharray WHERE ((jsonb_typeof(document->'results')='array' AND (document->>'results')::jsonb @> ANY ('{"[{\"product\": \"xyz\"}]"}'::jsonb[])) OR (jsonb_typeof(document->'results')!='array' AND json_build_array(document->>'results')::jsonb @> ANY ('{"[{\"product\": \"xyz\"}]"}'::jsonb[])))

The above query will not be that performant and most of the time if the query is on the non-array field. So, would suggest that we should introduce a few new operators at the document store interface for handling the array e.g NON_IN_ARRAY. This way, the client can control. For, now, I am moving with an opposite implementation for not_in of in by supporting the non-array type first on the Postgres side.

Closing this as it is added.