brianmario/charlock_holmes

Small Strings

etm opened this issue · 0 comments

etm commented

CharlockHolmes::EncodingDetector.detect_all("Timeout: 2")

results in

{:type=>:text, :encoding=>"IBM424_ltr", :ruby_encoding=>"binary", :confidence=>27, :language=>"he"},
{:type=>:text, :encoding=>"UTF-8", :ruby_encoding=>"UTF-8", :confidence=>15},
....

in general it seems to try too hard for small strings. for small strings it often favors esoteric (wrong) results over obvious ones.
is it possible to tweak this? is this intended?