prvst/unfinnigan

incorrect interpretation of multi-byte characters in GenericRecord

GoogleCodeExporter opened this issue · 0 comments

Running 

  uf-log e87_1a1.raw 

gives this output:

Spray Current (뗂䄀⤀    0.064699240089114

It should be (µA):

       - 248) entry[6] (48 bytes)
            0) type= 11: Finnigan data type (4 bytes)
            4) length= 2: object length (where it varies) (4 bytes)
          + 8) label= Spray Current (µA): Descriptor label (40 bytes)

This is caused by inadequate decoding of the string types from windows UTF-16LE 
with zero-byte truncation.

Original issue reported on code.google.com by selko...@gmail.com on 8 Dec 2011 at 1:59