Different encodings for different buffer lengths
Closed this issue · 2 comments
Hi,
This module is used in external-editor and I was having an issue because sometimes an encoding comes back that's not compatible with iconv-lite.
I'm using vim
on mac for the editor and I get the following encodings for the following strings
hi\n
=>UTF-32LE
hi\n\n
=>UTF-8
hi this is a longer string\n
=>ISO-8859-1
I'll also open an issue on external-editor
in case these are the correct results and it should be handled there.
Not sure there's a quick solution for your problem. The thing is, module uses statistical analysis of the binary data, and based on occurrences of certain bytes (and sequences) it determines the probability of most likely encoding. There're cases when it can be determined with 100% accuracy though, which is a bit of exception.
Thanks for the quick response. Since this is using probabilities to determine the encoding, I think the fix should be in the other library.