Breaks when strings are found in a column that's of dtype float
Closed this issue · 0 comments
HackToHell commented
In the schema.yml, dtype of a column is specified as !!python/name:__builtin__.float
. If the specific column however contains a string value instead of the specified float, causes the loader to return a empty data frame causing this error and stack trace.
AttributeError Traceback (most recent call last)
<ipython-input-4-66dedfb42d11> in <module>()
----> 1 dst=demo.load_dataset("stuff")
f:\python\pysemantic\pysemantic\project.pyc in load_dataset(self, dataset_name)
544 logger.info("Column rules:")
545 logger.info(json.dumps(column_rules, cls=TypeEncoder))
--> 546 return df_validator.clean()
547 else:
548 dfs = []
f:\python\pysemantic\pysemantic\validator.pyc in clean(self)
93 """Return the converted dataframe after enforcing all rules."""
94 if self.is_drop_na:
---> 95 x = self.data.shape[0]
96 self.data.dropna(inplace=True)
97 y = self.data.shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'
Debugging, it can be seen that the ValueError
in _load could not convert string to float
is not handled by by project.py
properly