muradgaribzada/juniversalchardet

Need to know the size of BOM

Opened this issue · 0 comments

When a file starts with a Byte Order Mark, there needs to be a way to discard 
those bytes. The detected charset is not enough information, because the file 
may include a BOM or not.

The easy way would be a method indicating the number of bytes to skip.

What steps will reproduce the problem?
1. Run the universal detector on a file with a BOM, such as UTF-16LE
2. Open a reader using the detected charset
3. Observe the spurious first character

Original issue reported on code.google.com by marcus.downing on 29 Apr 2011 at 12:08