/arithmetic-coding

Python implementation of arithmetic encoding/decoding for lossless data compression.

Primary LanguagePythonMIT LicenseMIT

arithmetic-coding

Python implementation of arithmetic encoding/decoding for lossless data compression.

Read a blog post about this implementation here.

Overview

This project provides a clean, correct, and modern Python implementation of the arithmetic coding algorithm. It is based on the classic 1987 paper "Arithmetic coding for data compression" by Witten, Neal, and Cleary.

Key features:

  • Clear and readable Python code
  • Focused on correctness and educational value
  • Extensive parametrized tests for verification

Note: This implementation prioritizes clarity and correctness over performance. For production use, consider more optimized implementations in lower-level languages.

Example

Here's an example showing how to use the ArithmeticEncoder.

>>> from arithmetic_coding import ArithmeticEncoder
>>> message = ['A', 'B', 'B', 'B', '<EOM>']
>>> frequencies = {'A': 1, 'B': 3, '<EOM>': 1}
>>> encoder = ArithmeticEncoder(frequencies=frequencies)
>>> bits = list(encoder.encode(message))
>>> bits
[0, 1, 0, 1, 1, 1, 0, 0]
>>> list(encoder.decode(bits))
['A', 'B', 'B', 'B', '<EOM>']

Testing

Run the test suite to verify the implementation.

pytest . --doctest-modules

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Stay within the scope of the project: clean, readable Python implementation, no low-level optimizations.

References

The two main references that I used were:

Other useful references to deep-dive into the material: