`utf8prober` confidence function magic number "6" breaks short UTF-8 detection.
lingsamuel opened this issue · 0 comments
lingsamuel commented
// src.utf8prober.js
this.getConfidence = function() {
var unlike = 0.99;
if( this._mNumOfMBChar < 6 ) {
for( var i = 0; i < this._mNumOfMBChar; i++ ) {
unlike *= ONE_CHAR_PROB;
}
return 1 - unlike;
} else {
return unlike;
}
}
This magic number makes UTF-8 text shorter than 6 chars confidence never defeat others.
A simple fix is add multibytes chars ratio check.