OasisLMF/OasisPiWind

PiWind OED files do not match OasisLMF format expectations

Closed this issue · 8 comments

The PiWind OED files contain non-string values in the AccNumber and LocNumber columns. When OasisLMF reads these files, it attempts to read these columns as integers, and fails to convert e.g. "L1".

Additionally, SourceLocOEDPiWind2 is missing the PortNumber column.

Happy to submit a PR with versions that do run successfully, but perhaps it's that OasisLMF that needs the changes, not the files.

@DanielFEvans That is incorrect - the MDK does not attempt to process the loc. number (or the loc. ID col. in the keys file) and acc.number columns as integers, it treats them as strings. We fixed this in the last release (1.3.3). You can have a look at the following step in GUL inputs generation for loading the exposure file and keys files into frames

https://github.com/OasisLMF/OasisLMF/blob/800ff13d6ae62427687f44689df3824686a791e2/oasislmf/model_preparation/gul_inputs.py#L78

and this step in IL inputs generation to load the accounts file into a frame

https://github.com/OasisLMF/OasisLMF/blob/800ff13d6ae62427687f44689df3824686a791e2/oasislmf/model_preparation/il_inputs.py#L261

We ran tests on PiWind where these columns in the source exposure and account files had a mixture of integer and string values, and they ran fine.

Can you provide more details?

Hi @sr-murthy - I thought it seemed a little bit strange, given that you did some changes along these lines a few days ago, and that the files were only changed recently.

I'm running PiWind via the OasisUI, having installed it using the OasisEvaluation setup script. When I load one of the PiWind portfolios and try generate the input files, it fails with the following log output:

Traceback (most recent call last):
--
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 629, in __protected_call__
return self.run(*args, **kwargs)
File "/home/worker/src/model_execution_worker/tasks.py", line 211, in generate_input
GenerateOasisFilesCmd(argv=run_args).run()
File "/usr/local/lib/python3.6/site-packages/argparsetree/cmd.py", line 161, in run
return self.action(args) or 0
File "/usr/local/lib/python3.6/site-packages/oasislmf/cli/model.py", line 333, in action
ri_scope_fp=ri_scope_fp
File "/usr/local/lib/python3.6/site-packages/oasislmf/utils/log.py", line 125, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/oasislmf/manager.py", line 385, in generate_oasis_files
source_exposure_fp=exposure_fp
File "/usr/local/lib/python3.6/site-packages/oasislmf/model_preparation/lookup.py", line 683, in save_results
for r in results:
File "/usr/local/lib/python3.6/site-packages/oasislmf/model_preparation/lookup.py", line 537, in get_results
exposure_df = get_dataframe(**kwargs)
File "/usr/local/lib/python3.6/site-packages/oasislmf/utils/data.py", line 134, in get_dataframe
set_col_dtypes(df, col_dtypes)
File "/usr/local/lib/python3.6/site-packages/oasislmf/utils/data.py", line 150, in set_col_dtypes
df[col] = df[col].astype(PANDAS_BASIC_DTYPES[dtype])
File "/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 4004, in astype
**kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3462, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3329, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 544, in astype
**kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 625, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "/usr/local/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 692, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 854, in pandas._libs.lib.astype_intsafe
File "pandas/_libs/src/util.pxd", line 91, in util.set_value_at_unsafe
ValueError: invalid literal for int() with base 10: 'L2'

(Edit: Apologies for the poor formatting on that traceback; it's copy-pasted straight from the UI)

It sounds like the versions that OasisEvaluation pulls might have got out of step?

@DanielFEvans You would have to ask @sambles or @mpinkerton-oasis about OasisEvaluation, but it sounds like the MDK version in that isn't the latest. The latest version definitely treats not just the loc. number and acc. number columns, but also the portfolio number and policy number columns, as strings. These columns are defined to be alphanumeric in the OED spec., and that is why this must be the case.

I've just run MDK 1.3.3 on PiWind master, which has an MDK config. file that defines a 10 row and 10K row exposure files with mixed integer and alphanumeric values for loc. numbers, and string values for acc. number, and it runs fine.

Oh dear, I see what I've done - I'd pulled the latest version of PiWind locally so I could input the portfolios to a remote test machine, but the Evaluation repository uses an older version. Very sorry for the false report.

On the point about the portfolio number column (PortNumber), yes this column must be present in the exposure file, as stated clearly in the OED spec. I believe all the OED exposure files in PiWind tests/data do have a PortNumber column.

I found that SourceLocOEDPiWind2.csv didn't have that column; the rest do.

THat was a temporary file, which will be deleted. The test exposure files we use are

SourceLocOEDPiWind10.csv
SourceLocOEDPiWind100.csv
SourceLocOEDPiWind1K.csv
SourceLocOEDPiWind10K.csv

There are some non-OED files in tests/data, which we will delete, as well as some other temporary files.