Incorrect detection of GB18030 as ISO-8859-16
jayvdb opened this issue · 2 comments
jayvdb commented
OS/Arch
$ python -c 'import platform;print(platform.uname())'
('Linux', 'linux-lwww', '5.1.7-1-default', '#1 SMP Tue Jun 4 07:56:54 UTC 2019 (55f2451)', 'x86_64', 'x86_64')
Python version
$ python3 --version
Python 3.7.2
cChardet version
$ python -c 'import cchardet;print(cchardet.__version__)'
2.1.4
What is the problem?
b'\xc4\xe3\xba\xc3'
is detected as {'encoding': 'ISO-8859-16', 'confidence': 0.3758675158023834}
which renders as ÄășĂ
Expected behavior
It should be detected as GB18030 你好
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.