Fix Extraction for Belarusian (and possibly others)
MichaelKohler opened this issue · 1 comments
MichaelKohler commented
In #118 we discovered that the automatic extraction on Pull Requests fails for Belarusian:
https://github.com/Common-Voice/cv-sentence-extractor/pull/118/checks?check_run_id=868518093
file_name = "/home/runner/work/cv-sentence-extractor/cv-sentence-extractor/text/AA/wiki_46"
Error: "stream did not contain valid UTF-8"
I've re-triggered the job several times, it always failed in a different file. Happy to help out if somebody wants to debug that!
Additional info:
- The author of the PR says it works locally, I didn't have time to check that on my machine
- It's very likely that the export on merge will fail as well due to that issue