brianmario/charlock_holmes

Wrong detection for simple string "nbs" - U_FILE_ACCESS_ERROR (ArgumentError)

iuri-gg opened this issue · 0 comments

When running CharlockHolmes::EncodingDetector.detect "nbs" I get high confidence (score 75) detection but it is wrong {:type=>:text, :encoding=>"IBM420_ltr", :ruby_encoding=>"binary", :confidence=>75, :language=>"ar"}.

Moreover when I try to convert that string to UTF8, I get U_FILE_ACCESS_ERROR (ArgumentError) error. Below is the code

input = "nbs"
encoding = Encoding::UTF_8
detection = CharlockHolmes::EncodingDetector.detect(input)
CharlockHolmes::Converter.convert(input, detection[:encoding], encoding.to_s)

I am using ruby 3.3.4 and gem version 0.7.8.

Am I coming across a bug or am I using it wrong?