jaidevd/pysemantic

Breaks when strings are found in a column that's of dtype float

Closed this issue · 0 comments

In the schema.yml, dtype of a column is specified as !!python/name:__builtin__.float. If the specific column however contains a string value instead of the specified float, causes the loader to return a empty data frame causing this error and stack trace.

AttributeError                            Traceback (most recent call last)
<ipython-input-4-66dedfb42d11> in <module>()
----> 1 dst=demo.load_dataset("stuff")

f:\python\pysemantic\pysemantic\project.pyc in load_dataset(self, dataset_name)
    544             logger.info("Column rules:")
    545             logger.info(json.dumps(column_rules, cls=TypeEncoder))
--> 546             return df_validator.clean()
    547         else:
    548             dfs = []

f:\python\pysemantic\pysemantic\validator.pyc in clean(self)
     93         """Return the converted dataframe after enforcing all rules."""
     94         if self.is_drop_na:
---> 95             x = self.data.shape[0]
     96             self.data.dropna(inplace=True)
     97             y = self.data.shape[0]

AttributeError: 'NoneType' object has no attribute 'shape'

Debugging, it can be seen that the ValueError in _load could not convert string to float is not handled by by project.py properly
2015-06-17 09_19_29-test - f__intern_test - f__intern_python_pysemantic_pysemantic_project py - py