matarrese/juniversalchardet

Incorrect encoding when the line contains two £ symbols followed by numbers

Opened this issue · 1 comments

What steps will reproduce the problem?
1.Create a file with following line
Wykamol,£588.95,0.18,0.12,testingSpecialised Products for DIY and 
Professionals£12
(Any text containing two  pound signs followed by numbers like
Wykamol,£588.95£12)
2. Save the file as Ansi
3.

What is the expected output? What do you see instead?
Western European(windows) or something.. but it is GB18030

What version of the product are you using? On what operating system?

1.0.3
Please provide any additional information below.
Not sure how the API is supposed to be used. I tried a simple file with few 
ansi characters like "Find Encoding".. API return encoding as null..




Original issue reported on code.google.com by vajr...@gmail.com on 12 Apr 2011 at 11:26

I got false positives from GB18030 too (issue 11).  I wonder if this is just an 
over-zealous GB18030 state machine.  I'd submit a fix but I'm having a hard 
time figuring out what needs to change in the GB18030SMModel.

Original comment by icw...@gmail.com on 13 Jul 2011 at 4:38