Sage-Bionetworks/synapsePythonClient

as_table_columns function is mishandling mixed data types

linglp opened this issue · 1 comments

Bug Report

I noticed that we could not submit manifest as a table if the manifest has mixed integer and float data type. See the issue here

After digging into the code, I noticed that this is because as_table_columns function that we imported from table.py to schematic could not interpret mixed data types correctly. When a column contains an integer and a float, infer_dtype from pandas API would interpret the data type as mixed-integer-float. But since it doesn't fall into any of the data type defined in here:

PANDAS_TABLE_TYPE = {
    'floating': 'DOUBLE',
    'decimal': 'DOUBLE',
    'integer': 'INTEGER',
    'boolean': 'BOOLEAN',
    'datetime64': 'DATE',
    'datetime': 'DATE',
    'date': 'DATE',
}

the code would still treat this data type as string.

Based on the documentation that I found here, Synapse table columns have the following data type: “STRING”, “DOUBLE”, “INTEGER”, “BOOLEAN”, “DATE”, “FILEHANDLEID”, “ENTITYID”, “LINK”, “LARGETEXT”, “USERID” . Can we then have a new one called "NUMERIC" to handle columns with a mixed of "integer" and "float"? Or can we at least interpret the mixed of integer and float as "DOUBLE" so that it won't get interpreted as string?

Thanks for your report! I will file a JIRA ticket to capture this work.