OasisLMF/ODS_Tools

Loading blank fields from DataFrame creates string value 'None'

Closed this issue · 0 comments

Issue Description

When loading a DataFrame to the OedSource class a blank field None is converted to dtype string "None"

This causes ReinsScope to fail validation on file upload in the OasisPlatform with:

> oed_validation_errors
[{'name': 'ri_scope', 'source': {'source_type': 'DataFrame'}, 'msg': 'invalid CountryCode.\n   ReinsNumber PortNumber AccNumber    LocNumber CountryCode\n0            1          1    A11111  10002082047        None\n1            1          1    A11111  10002082048        None'}]

Because "None" is not seen as a valid blank value

> country_only_df
   ReinsNumber PortNumber AccNumber PolNumber LocGroup    LocNumber  ... ProducerName   LOB CountryCode ReinsTag CededPercent  OEDVersion
0            1          1    A11111      None     None  10002082047  ...         None  None        None     None          0.1       2.0.0
1            1          1    A11111      None     None  10002082048  ...         None  None        None     None          0.2       2.0.0

[2 rows x 13 columns]

> country_only_df['CountryCode'][0]
'None'
> type(country_only_df['CountryCode'][0])
<class 'str'>

In the function:

def from_dataframe(cls, exposure, oed_type, oed_df: pd.DataFrame):
"""
OedSource Constructor from a filepath
Args:
exposure (OedExposure): Exposure the oed source is part of
oed_type (str): type of file (Loc, Acc, ..)
oed_df (pd.DataFrame): DataFrame that represent the Oed Source
Returns:
OedSource
"""
oed_source = cls(exposure, oed_type, 'orig', {'orig': {'source_type': 'DataFrame'}})
ods_fields = exposure.get_input_fields(oed_type)
pd_dtype = {}
to_tmp_dtype = {}
column_to_field = OedSchema.column_to_field(oed_df.columns, ods_fields)
for column in oed_df.columns:
if column in column_to_field:
pd_dtype[column] = column_to_field[column]['pd_dtype']
else:
pd_dtype[column] = 'category'
if pd_dtype[column] == 'category': # we need to convert to str first
to_tmp_dtype[column] = 'str'
elif pd_dtype[column].startswith('Int'):
to_tmp_dtype[column] = 'float'
oed_df = oed_df.astype(to_tmp_dtype).astype(pd_dtype)
oed_df = cls.prepare_df(oed_df, column_to_field, ods_fields)
if exposure.use_field:
oed_df = OedSchema.use_field(oed_df, ods_fields)
oed_source.dataframe = oed_df
oed_source.loaded = True
return oed_source