rowanz/merlot

Got `UnicodeDecodeError` whening load file `yttemporal180m_050of100.jsonl.gz`.

Closed this issue · 1 comments

Files 050, 073, and 098 are corrupted. When I decode them using jsonline, below error occured.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdf in position 6778: invalid continuation byte

Problem was solved by using encoding='latin-1' when loading the file.