incorrect interpretation of multi-byte characters in GenericRecord
GoogleCodeExporter opened this issue · 0 comments
GoogleCodeExporter commented
Running
uf-log e87_1a1.raw
gives this output:
Spray Current (뗂䄀⤀ 0.064699240089114
It should be (µA):
- 248) entry[6] (48 bytes)
0) type= 11: Finnigan data type (4 bytes)
4) length= 2: object length (where it varies) (4 bytes)
+ 8) label= Spray Current (µA): Descriptor label (40 bytes)
This is caused by inadequate decoding of the string types from windows UTF-16LE
with zero-byte truncation.
Original issue reported on code.google.com by selko...@gmail.com
on 8 Dec 2011 at 1:59