BYVoid/uchardet

Windows-1251 detection failed on a file in Russian.

Jehan opened this issue · 1 comments

Jehan commented

I've added some test files in test/.
Among them, there is windows-1251-bulgarian.txt and windows-1251-russian.txt.
The Bulgarian text is well detected as Windows 1251, but the Russian one is detected as Mac Cyrillic.
Note that I have checked. One is not even a subset of the other, and the wrong detection actually break the text (easily checked by making an encoding conversion with iconv).
It would be worth improving our Russian models for Windows-1251.

I open this report to remember.

Jehan commented

Commit 4f1c3ff fixed a few detection, this one included.