Encoding detection errors
jpcima opened this issue · 3 comments
jpcima commented
I think, the encoding detection issues are vastly resolved, but I'll drop here some samples which are still failing.
z2ow.mid.gz CP932
jpcima commented
random set of some vgmusic's which misdetect:
beachcave.mid.gz
ff1flcst.mid.gz
Mi%27Ihen_Highway.mid.gz
realemotion1.mid.gz
so2_hurry.mid.gz
jpcima commented
jpcima commented
Idea of algorithm for new heuristic
Let S be an input string of length N
Score ← 0
Counter ← 0
Script ← None
For each codepoint C of S:
PreviousScript ← Script
Script ← uscript_getScript(C)
If Script ≠ PreviousScript:
Counter ← 0
Counter ← Counter + 1
Score ← Score + Counter
Score ← Score / N