utf-8 file with BOM passes BOM as part of first header NAME
Closed this issue · 2 comments
ccoulter commented
If my file/stream starts with the UTF-8 BOM (3 char \xef\xbb\xbf), it is passed through as part of the first header (or first data value on the first row).
Should unicodecsv handle this (remove it), or should the user sniff for and skip over it before instantiating the UnicodeReader class?
What do you think the best way to handle this is? FWIIW, I'm using Python 2.7.
jdunck commented
I'm pretty sure you can construct your reader with 'utf-8-sig' rather than 'utf-8', and the codec will strip the BOM for you.
ccoulter commented
Thanks. That seems to have fixed it. Most of my CSVs are generated on Windows even though I'm processing them on Linux.