⚠️ This project is currently in the planning stage. The documentation is only partially accurate and there are lots of bugs and missing features!
bitformat is a Python module for creating, manipulating and interpreting binary data. It also supports parsing and creating more complex binary formats.
It is from the author of the widely used bitstring module.
- The
Bits
class represents a sequence of binary data of arbitrary length. It provides methods for creating, modifying and interpreting the data. - The
Format
class provides a way to define a binary format using a simple and flexible syntax. - A wide array of data types is supported with no arbitrary restrictions on length.
- Data is always stored efficiently as a contiguous array of bits.
A variety of constructor methods are available to create Bits
, including from binary, hexadecimal or octal strings, formatted strings, byte literals and iterables.
>>> from bitformat import *
>>> a = Bits('0b1010') # Create from a binary string
>>> b = Bits('u12 = 54') # Create from a formatted string.
>>> c = Bits.from_bytes(b'\x01\x02\x03') # Create from a bytes or bytearray object.
>>> d = Bits.pack('f16', -0.75) # Pack a value into a data type.
>>> e = Bits.join([a, b, c, d]) # The best way to join lots of bits together.
Although the examples above were created from a variety of data types, the Bits
instance doesn't retain any knowledge of how it was created - it's just a sequence of bits.
You can therefore interpret them however you'd like:
>>> a.i
-6
>>> b.hex
'036'
>>> c.unpack(['u4', 'f16', 'u4'])
[0, 0.0005035400390625, 3]
>>> d.bytes
b'\xba\x00'
The unpack
method is available as a general-case way to unpack the bits into a single or multiple data types.
If you only want to unpack to a single data type you can use properties of the Bits
as a short-cut.
A wide range of data types are supported. These are essentially descriptions on how binary data can be converted to a useful value. The Dtype
class is used to define these, but usually just the string representation can be used.
Some example data type strings are:
'u3'
- a 3 bit unsigned integer.'i_le32'
- a 32 bit little-endian signed integer.'f64'
- a 64 bit IEEE float. Lengths of 16, 32 and 64 are supported.'bool'
- a single bit boolean value.'bytes10'
- a 10 byte sequence.'hex'
- a hexadecimal string.'bin'
- a binary string.'[u8; 40]'
- an array of 40 unsigned 8 bit integers.
Byte endianness for floating point and integer data types is specified with _le
, _be
and _ne
suffixes to the base type.
An extensive set of operations are available to query Bits
or to create new ones. For example:
>>> a + b # Concatenation
Bits('0xa036')
>>> c.find('0b11') # Returns found bit position
22
>>> b.replace('0b1', '0xfe')
Bits('0x03fbf9fdfc')
>>> b[0:10] | d[2:12] # Slicing and logical operators
Bits('0b1110101101')
An Array
class is provided which stores a contiguous sequence of Bits
of the same data type.
This is similar to the array
type in the standard module of the same name, but it's not restricted to just a dozen or so types.
>>> r = Array('i5', [4, -3, 0, 1, -5, 15]) # An array of 5 bit signed ints
>>> r -= 2 # Operates on each element
>>> r.unpack()
[2, -5, -2, -1, -7, 13]
>>> r.dtype = 'u6' # You can freely change the data type
>>> r
Array('u6', [5, 47, 55, 60, 45])
>>> r.to_bits()
Bits('0b000101101111110111111100101101')
The Format
class can be used to give structure to bits, as well as storing the data in a human-readable form.
>>> f = Format('[width: u12, height: u12, flags: [bool; 4]]')
>>> f.pack([320, 240, [True, False, True, False]])
Bits('0x1400f0a')
>>> print(f)
[
width: u12 = 320,
height: u12 = 240,
flags: [bool; 4] = (True, False, True, False)
]
>>> f['height'].value /= 2
>>> f.to_bits()
Bits('0x140078a')
>>> f.to_bits() == 'u12=320, u12=120, 0b1010'
True
The Format
and its fields can optionally have names (the Format
above is unnamed, but its fields are named).
In this example the pack
method was used with appropriate values, which then returned a Bits
object.
The Format
now contains all the interpreted values, which can be easily accessed and modified.
The final line in the example above demonstrates how new Bits
objects can be created when needed by promoting other types, in this case the formatted string is promoted to a Bits
object before the comparison is made.
The Format
can be used symmetrically to both create and parse binary data:
>>> f.parse(b'x\x048\x10')
28
>>> f
Format([
'width: u12 = 1920',
'height: u12 = 1080',
'flags: [bool; 4] = (False, False, False, True)'
])
The parse
method is able to lazily parse the input bytes, and simply returns the number of bits that were consumed. The actual values of the individual fields aren't calculated until they are needed, which allows large and complex file formats to be efficiently dealt with.
The bitformat
library is still pre-alpha and is being actively developed.
I'm hoping to make an alpha release or two in late 2024, with more features added in 2025.
There are a number of important features planned, some of which are from the bitstring
library on which much of the core is based, and others are needed for a full binary format experience.
The (unordered) :todo: list includes:
- Streaming methods. There is no concept of a bit position, or of reading through a
Bits
. This is available inbitstring
, but I want to find a better way of doing it before adding it tobitformat
. - Field expressions. Rather than hard coding everything in a field, some parts will be calculated during the parsing process. For example in the format
'[w: u16, h: u16, [u8; {w * h}]]'
the size of the'u8'
array would depend on the values parsed just before it. - New field types. Fields like
Repeat
,Find
andIf
are planned which will allow more flexible formats to be written. - Exotic floating point types. In
bitstring
there are a number of extra floating point types such asbfloat
and the MXFP 8, 6 and 4-bit variants. These will be ported over tobitformat
. - Performance improvements. A primary focus on the design of
bitformat
is that it should be fast. Early versions won't be well optimized, but tests so far are quite promising, and the design philosophy should mean that it can be made even more performant later.
Copyright (c) 2024 Scott Griffiths