sigmorphon/2022SegmentationST

English and Czech test data part 2

slvnwhrl opened this issue · 2 comments

While preparing the test data of part 2 for prediction, I realised that some of the English sentences start with a space as well as one Czech sentence (this sentence also contains two consecutive spaces at some point). I was wondering whether this intentional or a mistake (as it does not seem to happen in the dev or training data).

Thanks!

Thanks for noticing it. I have just updated it in the English sentence test dataset. It only affected 5 rows.

evaluation.py code actually ignores the starting and ending spaces of sentences or in-between double spaces. So it would be fine.