gopro/gpmf-parser

Types, update README/documentation

blipmusic opened this issue · 6 comments

Could the following types be added to the documentation?

'u'/117/GPMF_TYPE_STRING_UTF8 and '#'/35/GPMF_TYPE_COMPRESSED (I've only found '#' in GoPro Max so far?)

For those of us trying to implement their own parser, should we assume "internal types" 0xfe and 0xff can be ignored or will that break parsing?

The camera currently doesn't implement any UTF-8 datatypes. These variable length datatypes are potentially problematic in GPMF, as a type-size-repeat field of ''u',1,10' may not contain 10 characters, as the datasize for each entry is content dependent. As size x repeat must be the storage size, for UTF-8, repeat will not automatically be the number of entries. It works, just less pretty.
Another less pretty example is compression. Compression typical ly reduces the data size by using variable length encoding, this is what the '#' datatype signifies. The compression is delta-Huffman-run-length encoding. A little like JPEG entropy encoding. It is not documented. You can view the encoding side within https://github.com/gopro/gpmf-write/blob/master/GPMF_writer.c
You should not see either 0xfe or 0xff types in stored GPMF bitstream.

The camera currently doesn't implement any UTF-8 datatypes. These variable length datatypes are potentially problematic in GPMF, as a type-size-repeat field of ''u',1,10' may not contain 10 characters, as the datasize for each entry is content dependent. As size x repeat must be the storage size, for UTF-8, repeat will not automatically be the number of entries. It works, just less pretty.

Right, difficult with varying code point length in UTF-8 of course. Interpretation of UTF-8 data would have to be a special case, I guess, i.e. not as clean as for the rest of the types.

I did have odd encounters with ascii strings however, where some values have size 1, repeat X, others size X, repeat 1. I think this was for Hero7 Black. So for the header, I interpret both "size" and "repeat" as UInt16 and switch them around whenever I encounter type "c"/99. All c/ascii are then treated as size X, repeats 1 (works better in my case, but will probably mean future bugs...).

A follow up question for Ascii and encoding: these go outside of the standard range, all the way up to 255 if I read correctly. Is there a specific extended ascii code page involved, that I should go by?

Another less pretty example is compression. Compression typical ly reduces the data size by using variable length encoding, this is what the '#' datatype signifies. The compression is delta-Huffman-run-length encoding. A little like JPEG entropy encoding. It is not documented. You can view the encoding side within https://github.com/gopro/gpmf-write/blob/master/GPMF_writer.c

Thanks, I pondered whether I should try to decode or not, but I'll take a look regardless, thanks for the link.

You should not see either 0xfe or 0xff types in stored GPMF bitstream.

Noted, thanks!

All 'c' type character arrays should now be in the format size 'X' repeat 1, but you are correct this wasn't historically consistent with some developers thinking, repeat = strlen(input);. As repeat can have a temporal meaning in some contexts, we worked to have this cleared up. Fortunately there has yet to be any time varying stream that used type 'c', so all existing string array sizes can be 'size x repeat'. Only a single byte ASCII code is support within a type 'c' -- 0 to 255.

All 'c' type character arrays should now be in the format size 'X' repeat 1, but you are correct this wasn't historically consistent with some developers thinking, repeat = strlen(input);. As repeat can have a temporal meaning in some contexts, we worked to have this cleared up. Fortunately there has yet to be any time varying stream that used type 'c', so all existing string array sizes can be 'size x repeat'.

Aha, so there was a reason for this. Good to know what to expect from now on, thanks.

Only a single byte ASCII code is support within a type 'c' -- 0 to 255.

Right, but standard ascii is 7 bit so it only goes up to 127? I'm quite possibly out of the loop here, but 128-255 was historically used quite flexibly to extend the standard character set for various languages. So since that range of characters don't obey a single standard, it's often difficult to predict ascii above 127. Is GoPro perhaps using something like Windows-1252, ISO-8859-1? Or am I misunderstanding?

Yes, we haven't specified, although we are current using ISO 8859-1. Within ACCL we use "m/s²" we used superscript two for squared, code 0xB2.

Great, thanks! Yes, I noticed the superscript which was actually what had me wondering in the first place, since they were decoded correctly for my UTF-8 strings. This makes sense since I think UTF-8 is a superset of ISO8859-1, making it an excellent choice for single byte encoding. I'll just decode 'c' as UTF-8 and it should work fine. :)