nert-nlp/pastrie

Apostrophes removed in preprocessing?

nschneid opened this issue · 2 comments

Looking through the data, there are a LOT of sentences where clitics are tokenized off but lack an apostrophe. Is that just the genre or did they get lost in preprocessing?

This is indeed a preprocessing issue. Will try to fix along with some others.