runk/node-chardet

Different encodings for different buffer lengths

Closed this issue · 2 comments

doowb commented

Hi,

This module is used in external-editor and I was having an issue because sometimes an encoding comes back that's not compatible with iconv-lite.

I'm using vim on mac for the editor and I get the following encodings for the following strings

  • hi\n => UTF-32LE
  • hi\n\n => UTF-8
  • hi this is a longer string\n => ISO-8859-1

I'll also open an issue on external-editor in case these are the correct results and it should be handled there.

runk commented

Not sure there's a quick solution for your problem. The thing is, module uses statistical analysis of the binary data, and based on occurrences of certain bytes (and sequences) it determines the probability of most likely encoding. There're cases when it can be determined with 100% accuracy though, which is a bit of exception.

doowb commented

Thanks for the quick response. Since this is using probabilities to determine the encoding, I think the fix should be in the other library.