PyYoshi/cChardet

Incorrect detection of GB18030 as ISO-8859-16

jayvdb opened this issue · 2 comments

OS/Arch

$ python -c 'import platform;print(platform.uname())'
('Linux', 'linux-lwww', '5.1.7-1-default', '#1 SMP Tue Jun 4 07:56:54 UTC 2019 (55f2451)', 'x86_64', 'x86_64')

Python version

$ python3 --version
Python 3.7.2

cChardet version

$ python -c 'import cchardet;print(cchardet.__version__)'
2.1.4

What is the problem?

b'\xc4\xe3\xba\xc3' is detected as {'encoding': 'ISO-8859-16', 'confidence': 0.3758675158023834} which renders as ÄășĂ

Expected behavior

It should be detected as GB18030 你好

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.