brianmario/charlock_holmes

Best practice for large files?

machty opened this issue · 0 comments

The charlock_holmes API seems to be string centric, but if have a 50mb file which mostly consists of typical alphabetic/ASCII characters but only has a few non-ASCII characters to distinguish the encodings, what's the best way detect the entire file's encoding without loading the 50mb file (or larger) into memory?