as_table_columns function is mishandling mixed data types
linglp opened this issue · 1 comments
Bug Report
I noticed that we could not submit manifest as a table if the manifest has mixed integer and float data type. See the issue here
After digging into the code, I noticed that this is because as_table_columns
function that we imported from table.py
to schematic could not interpret mixed data types correctly. When a column contains an integer and a float, infer_dtype
from pandas API would interpret the data type as mixed-integer-float
. But since it doesn't fall into any of the data type defined in here:
PANDAS_TABLE_TYPE = {
'floating': 'DOUBLE',
'decimal': 'DOUBLE',
'integer': 'INTEGER',
'boolean': 'BOOLEAN',
'datetime64': 'DATE',
'datetime': 'DATE',
'date': 'DATE',
}
the code would still treat this data type as string.
Based on the documentation that I found here, Synapse table columns have the following data type: “STRING”, “DOUBLE”, “INTEGER”, “BOOLEAN”, “DATE”, “FILEHANDLEID”, “ENTITYID”, “LINK”, “LARGETEXT”, “USERID” . Can we then have a new one called "NUMERIC" to handle columns with a mixed of "integer" and "float"? Or can we at least interpret the mixed of integer and float as "DOUBLE" so that it won't get interpreted as string?
Thanks for your report! I will file a JIRA ticket to capture this work.