
CGPM_Metamodel should casefold categorical conditions

Opened this issue · 1 comments

fsaad commented

Consider using lower case meo, which is being converted to null and hence probability 1:

bdb = bayeslite.bayesdb_open('satellites.2048.bdb')
query(bdb, '''
    ESTIMATE PROBABILITY OF class_of_orbit = 'meo'
            GIVEN (period_minutes=850)
        WITHIN satellites_p;
   bql_pdf_joint(1, NULL, 5, 'meo', NULL, 10, 850)
0                                              1.0

versus using upper case MEO, which is being converted to the correct small integer code

bdb = bayeslite.bayesdb_open('satellites.2048.bdb')
query(bdb, '''
    ESTIMATE PROBABILITY OF class_of_orbit = 'MEO'
            GIVEN (period_minutes=850)
        WITHIN satellites_p;
   bql_pdf_joint(1, NULL, 5, 'MEO', NULL, 10, 850)
0                                         0.815711
fsaad commented

Updating the schema of bayesdb_cgpm_category

to use value TEXT COLLATE NOCASE NOT NULL will likely solve the issue.

git blame shows that @riastradh-probcomp is author of the schema, perhaps he can weigh in as to why the NOCASE COLLATE was not used, and similarly for the categorical code map for in