Check input file for BOM / Byte Order Mark (REGRESSION?)
tilo opened this issue · 2 comments
tilo commented
some CSV files contain a Byte Order Mark
https://en.wikipedia.org/wiki/Byte_order_mark
e.g.
$ hexdump -C /tmp/sample.csv
00000000 ef bb bf 75 73 65 72 5f 69 64 2c 74 79 70 65 2c |...user_id,type,|
00000010 6d 65 74 61 6c 5f 70 69 64 0d 0a 34 33 32 31 30 |metal_pid..43210|
00000020 38 30 35 2c 72 65 69 73 73 75 65 2c 31 32 33 34 |805,reissue,1234|
First 3 bytes ef bb bf
should be ignored
Other BOM Markers:
* UTF-8 with BOM: EF BB BF
* UTF-16BE (big-endian): FE FF
* UTF-16LE (little-endian): FF FE
* UTF-32BE (big-endian): 00 00 FE FF
* UTF-32LE (little-endian): FF FE 00 00
tilo commented
HINT: this is typically caused by some Microsoft tools.
A way to fix this is to run dos2unix filename