man-group/ArcticDB

Using 'and' in QueryBuilder queries does not produce error but lead to wrong results returned

grusev opened this issue · 0 comments

Describe the bug

Another usability note ... while playing with QueryBuilder I noticed that following query works correctly
q2 = QueryBuilder()
q2 = q2[q2["bool"] & (q2["int8"] > 5)]
while following also works, but incorrectly:
q2 = QueryBuilder()
q2 = q2[q2["bool"] and (q2["int8"] > 5)]
'and' is not supported by QueryBuilder as per doc, thus I think it would be great to eventually raise an error with a message like "QueryBuilder supports 'and' through '&'. Please, correct query".
Do not know if such things are planned but would be nice to have at some moment some checks there to prevent accidental wrong results produced.
Not sure if this has to be logged as issue ...

Steps/Code to Reproduce

from arcticdb.version_store.library import ReadRequest
from arcticdb.version_store.processing import QueryBuilder
import pandas as pd
from arcticdb.util.test import assert_frame_equal, create_df_index_rownum, create_df_index_datetime, random_string, dataframe_arctic_update, get_sample_dataframe, dataframe_dump_to_log, distinct_timestamps

def assert_frame_equal_row_range_fix(expected : pd.DataFrame, actual : pd.DataFrame):
expected.reset_index(inplace = True, drop = True)
actual.reset_index(inplace = True, drop = True)
assert_frame_equal(expected, actual)

def test_read_batch_query_with_and(arctic_library):
lib = arctic_library

symbol = "_s_"
df = get_sample_dataframe(100)

dfq = "bool == True and int8 > 5"
q = QueryBuilder()
q = q[q["bool"] and (q["int8"] > 5)]

lib.write(symbol, df)

batch = lib.read_batch(symbols=[ReadRequest(symbol, query_builder=q)])
df_result = df.query(dfq, inplace=False)

print("pandas")
print(df_result)
print("arctic")
print(batch[0])
print(batch[0].data)

assert_frame_equal_row_range_fix(df_result, batch[0].data)

Expected Results

There should have been some error explaining that "and" is not supported and must use '&' instead

OS, Python Version and ArcticDB Version

Python 3.11
Arctic top of master

Backend storage used

No response

Additional Context

Alex Owens
Yesterday at 17:11
When you say it "works" with and , what does it do? & works by overloading the and operator, but it isn't possible to do anything similar with and , so we're slightly at the mercy of Python here
17:13
Actually, maybe we could make it work by overloading bool
https://docs.python.org/3/reference/datamodel.html#object.__bool__
Python documentationPython documentation
3. Data model
Objects, values and types: Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von ...

Alex Owens
Yesterday at 17:25
Looking at the docs, all objects evaluate to True in a boolean context unless you overload bool , so I'm guessing it just returns the whole dataframe without any filtering?

Georgi Rusev
Yesterday at 18:04
In all cases will return wrong results. In my case it returned the data with only right hand of 'and' applied. That is the reason I think that if this can be prevented it should be. Would save time to people looking what is wrong or eventually prevent them from making wrond extract and thinking those extracts are correct. The last case is worse of course

Alex Owens
Yesterday at 18:08
Suspect if we overload bool to throw an exception saying and/or/xor/not operands are not supported and &/|/^/~ should be used instead that would be sufficient