relatio-nlp/relatio

Fixes to split_into_sentences method in utils.py

Closed this issue · 1 comments

When I used the split_into_sentences method for my .csv files, I came across this issue of blank entries in text fields, which are treated as NaN in the df object on which the method is run. Due to this, the len() method in split_into_sentences() failed as it is not defined for float (NaN) type. I used df.fillna("", inplace=True) inside my DataFrame to solve this but finding this error required a lot of work, it wasn't explicit. It should be a good idea to handle this issue at the source itself, either with correction or with better error messages, pointing to the solution.

Thanks for raising this point. I will check your pull request and come back to you.