/pybcl

Compression and decompression using various algorithms provided by the basic compression library (BCL).

Primary LanguageCApache License 2.0Apache-2.0

pybcl

Python versions PyPI License

This project brings the basic compression library (BCL) to Python. These are not bindings, the wrapped library is bundled in the compiled binary. A few changes have been made to the original BCL in order to (hopefully) prevent segmentation faults for LZ77 and RLE decompression, and to ease the development of the module.

Algorithms

The BCL contains a C implementation of these five algorithms:

  • Huffman
  • Lempel-Ziv (LZ77)
  • Rice
  • RLE (Run-length encoding)
  • Shannon-Fano

Requirements

Python 3.7+

Installation

pip install pybcl

Usage

Caution

While there's been an effort to prevent buffer overflows for the RLE and LZ77 decompression algorithms, the other three are very likely to segfault if you give them corrupt/random data.

API

from pybcl import compress, decompress, ...

# Functions exposed by the C extension.
def compress(data, algo, header=False): ...
def decompress(data, algo=0, outsize=0): ...

# Shortcut functions.
def huffman_compress(data, header=False): ...
def lz_compress_fast(data, header=False): ...
def rice_compress(data, format, header=False): ...
def rle_compress(data, header=False): ...
def sf_compress(data, header=False): ...

def huffman_decompress(data, outsize=0): ...
def lz_decompress(data, outsize=0): ...
def rice_decompress(data, format, outsize=0): ...
def rle_decompress(data, outsize=0): ...
def sf_decompress(data, outsize=0): ...

For compression you can choose whether the header should be included in the result.

For decompression you can override outsize by giving a positive value. algo and outsize aren't required if the data contains a header.

Two enums are provided for the algorithms and Rice formats. Example:

from pybcl import Algorithm, RiceFormat

data = b"test"
compressed = compress(data, Algorithm.RICE8)
decompressed = rice_decompress(compressed, RiceFormat.UINT8, len(data))

Command line

Compression:

usage: pybcl c [-h] [-a ALGO] [-o OFFSET] [-m SIZE] [-f] [--no-header] src dest

positional arguments:
  src                         input file
  dest                        output file

options:
  -h, --help                  show this help message and exit
  -a ALGO, --algo ALGO        algorithm for (de)compression. Not required for decompression if a header is present
  -o OFFSET, --offset OFFSET  position in src where to start reading from
  -m SIZE, --maxread SIZE     max amount of bytes to read from src. Default: all that can be read
  -f, --force                 overwrite dest
  --no-header                 do not write a header for the file

Decompression:

usage: pybcl d [-h] [-a ALGO] [-o OFFSET] [-m SIZE] [-f] [-s SIZE] [--hvariant] src dest

positional arguments:
  src                         input file
  dest                        output file

options:
  -h, --help                  show this help message and exit
  -a ALGO, --algo ALGO        algorithm for (de)compression. Not required for decompression if a header is present
  -o OFFSET, --offset OFFSET  position in src where to start reading from
  -m SIZE, --maxread SIZE     max amount of bytes to read from src. Default: all that can be read
  -f, --force                 overwrite dest
  -s SIZE, --outsize SIZE     required if no header
  --hvariant                  force reading the header variant

When decompressing data that has a header with LZ77 or RLE, if you get an OutputOverrun error you can override the header's outsize to specify a higher value.

Header variant

Some camera firmwares contain parts that are compressed with a modified version of the BCL that adds the size of the compressed data to the header and replaces two of the always empty bytes of the algo by unknown data (maybe a checksum). A HeaderVariant class is provided for this specific case. For now only the CLI makes use of this class. Note that this has nothing to do with the original library and is only included because I need it for another project. See here for an example.

Original library

The BCL is written by Marcus Geelnard and licensed under the terms of the zlib license.

You can find it here:

It comes with the basic file compressor, or BFC, which is a test application for the BCL. Data compressed with the BFC starts with the BCL1 magic.