English and Czech test data part 2

Question

English and Czech test data part 2

slvnwhrl opened this issue 3 years ago · 2 comments

While preparing the test data of part 2 for prediction, I realised that some of the English sentences start with a space as well as one Czech sentence (this sentence also contains two consecutive spaces at some point). I was wondering whether this intentional or a mistake (as it does not seem to happen in the dev or training data).

Thanks!

Answer 1 · 2022-05-10T10:30:00.000Z

Thanks for noticing it. I have just updated it in the English sentence test dataset. It only affected 5 rows.

Answer 2 · 2022-05-10T10:31:18.000Z

evaluation.py code actually ignores the starting and ending spaces of sentences or in-between double spaces. So it would be fine.