Simple tool to parse and extract data from binary files given their structure. Nesting of different structures and unknown structure sizes are supported.
Uses collections.OrderedDict, struct.calcsize, struct.unpack
bin_struct is a an OrderedDict constructed either by a single string argument that gets passed to struct.unpack or multiple key=value pairs representing the struct(ure) of the data.
The keys will be the representative names amd can be anything you specify.
The values can be strings for struct.unpack or another initialized bin_struct type for nesting
ushort = bin_struct("H") # (*)
bin_float = bin_struct("f")
bin_string = bin_struct("6c") # see also string_struct as a better alternative
vector = bin_struct("3f")
# or
vector = bin_struct(x=bin_float, y=bin_float, z=bin_float)
# (*) There are some initial types:
# Hardcoded to LITTLE ENDIAN!
# short = "<h"
# ushort = "<H"
# bfloat = "<f"
# uchar = "<B"
# ulong = "<L"
# uint = "<I"
# bint = "<i"
# char = "<i"
After initializing a bin_struct object can be called to parse the given byte argument The necessary byte size can be retrieved with bin_struct.size and updated with bin_struct.calcsize() if a changed was made to the object.
Out of a bytes object the data then can be retrieved by:
xyz = vector(byte[:vector.size])
Even if more bytes are provided only the onces according to size are evaluated.
my_int = ushort(b'00')
This returns a parsed_struct object also an OrderedDict.
The values can then be retrieved with their keywords
xyz.x, xyz.y, xyz.z
For single valued objects with the keyword value
my_int.value
NOTE: Unlike with struct.unpack single values are not in tuples
for example:
ushort(b'00').value # -> 12366 and not (12366,)
For values that shall be converted to strings 'c', 's', 'p'
There are two options. Use a parsed objects parsed_struct().tostring(key="value")
or the parsed_struct class staticmethod parsed_struct.get_string(bytes)
.
The default decode argument for 's' and 'p' are is set with the variable DECODER = 'ASCII'
.
# Others are not tested. I doubt utf works.
parsed_string = bin_string(b'hello\x00') # See examples this was bin_struct("6c")
parsed_string.tostring(s) # -> "hello"
# or
parsed_struct.get_string(parsed_string.s) # -> "hello"
#Note: While typing this i'm not sure how correct this example is.
NOTE: Empty bytes \x00 are removed by default.
tostring
and get_string
have a remove=iterable
(can also be False or None) parameter which let's you keep these or remove others.
The default is set as a list: _default_remove = [b'\x00', 0]
easier to use is the string_struct
Like the license states I won't guarantee that everything works correctly and as intended, it is an early stage project extracted from another one which works as a standalone. If you realize something is not correct feel free the open up issues.