CGPM_Metamodel should casefold categorical conditions
Opened this issue · 1 comments
fsaad commented
Consider using lower case meo
, which is being converted to null
and hence probability 1:
bdb = bayeslite.bayesdb_open('satellites.2048.bdb')
query(bdb, '''
ESTIMATE PROBABILITY OF class_of_orbit = 'meo'
GIVEN (period_minutes=850)
WITHIN satellites_p;
''')
Out[1]:
bql_pdf_joint(1, NULL, 5, 'meo', NULL, 10, 850)
0 1.0
versus using upper case MEO
, which is being converted to the correct small integer code
bdb = bayeslite.bayesdb_open('satellites.2048.bdb')
query(bdb, '''
ESTIMATE PROBABILITY OF class_of_orbit = 'MEO'
GIVEN (period_minutes=850)
WITHIN satellites_p;
''')
Out[2]:
bql_pdf_joint(1, NULL, 5, 'MEO', NULL, 10, 850)
0 0.815711
``
fsaad commented
Updating the schema of bayesdb_cgpm_category
https://github.com/probcomp/bayeslite/blob/master/src/metamodels/cgpm_metamodel.py#L58-L65
to use value TEXT COLLATE NOCASE NOT NULL
will likely solve the issue.
git blame
shows that @riastradh-probcomp is author of the schema, perhaps he can weigh in as to why the NOCASE COLLATE
was not used, and similarly for the categorical code map for crosscat.py
in
https://github.com/probcomp/bayeslite/blob/master/src/metamodels/crosscat.py#L78-L87.