multimeric/PandasSchema

DateFormatValidation should have a toggle for allowing nan values (empty cells in a csv)

Abhisek1994Roy opened this issue · 1 comments

I have created a custom function to solve this-

class CustomDateFormatValidation(_SeriesValidation):
    def __init__(self, date_format: str, nullable: bool = False, **kwargs):
        self.date_format = date_format
        self.nullable=nullable
        super().__init__(**kwargs)

    @property
    def default_message(self):
        return 'does not match the date format string "{}"'.format(self.date_format)

    def valid_date(self, val):
        if self.nullable and val == 'nan':
            return True
        try:
            datetime.datetime.strptime(val, self.date_format)
            return True
        except:
            return False

    def validate(self, series: pd.Series) -> pd.Series:
        return series.astype(str).apply(self.valid_date)

Wanted to know if I should make this change to the existing DateFormatValidation Function and give a pull?

Can you please check if allow_empty helps, and if not, can you test using this PR? #44