probcomp/bayeslite

CGPM_Metamodel should casefold categorical conditions

Opened this issue · 1 comments

fsaad commented

Consider using lower case meo, which is being converted to null and hence probability 1:

bdb = bayeslite.bayesdb_open('satellites.2048.bdb')
query(bdb, '''
    ESTIMATE PROBABILITY OF class_of_orbit = 'meo'
            GIVEN (period_minutes=850)
        WITHIN satellites_p;
''')
Out[1]: 
   bql_pdf_joint(1, NULL, 5, 'meo', NULL, 10, 850)
0                                              1.0

versus using upper case MEO, which is being converted to the correct small integer code

bdb = bayeslite.bayesdb_open('satellites.2048.bdb')
query(bdb, '''
    ESTIMATE PROBABILITY OF class_of_orbit = 'MEO'
            GIVEN (period_minutes=850)
        WITHIN satellites_p;
''')
Out[2]: 
   bql_pdf_joint(1, NULL, 5, 'MEO', NULL, 10, 850)
0                                         0.815711
``
fsaad commented

Updating the schema of bayesdb_cgpm_category

https://github.com/probcomp/bayeslite/blob/master/src/metamodels/cgpm_metamodel.py#L58-L65

to use value TEXT COLLATE NOCASE NOT NULL will likely solve the issue.

git blame shows that @riastradh-probcomp is author of the schema, perhaps he can weigh in as to why the NOCASE COLLATE was not used, and similarly for the categorical code map for crosscat.py in

https://github.com/probcomp/bayeslite/blob/master/src/metamodels/crosscat.py#L78-L87.