Python implementation of arithmetic encoding/decoding for lossless data compression.
Read a blog post about this implementation here.
This project provides a clean, correct, and modern Python implementation of the arithmetic coding algorithm. It is based on the classic 1987 paper "Arithmetic coding for data compression" by Witten, Neal, and Cleary.
Key features:
- Clear and readable Python code
- Focused on correctness and educational value
- Extensive parametrized tests for verification
Note: This implementation prioritizes clarity and correctness over performance. For production use, consider more optimized implementations in lower-level languages.
Here's an example showing how to use the ArithmeticEncoder
.
>>> from arithmetic_coding import ArithmeticEncoder
>>> message = ['A', 'B', 'B', 'B', '<EOM>']
>>> frequencies = {'A': 1, 'B': 3, '<EOM>': 1}
>>> encoder = ArithmeticEncoder(frequencies=frequencies)
>>> bits = list(encoder.encode(message))
>>> bits
[0, 1, 0, 1, 1, 1, 0, 0]
>>> list(encoder.decode(bits))
['A', 'B', 'B', 'B', '<EOM>']
Run the test suite to verify the implementation.
pytest . --doctest-modules
Contributions are welcome! Please feel free to submit a Pull Request. Stay within the scope of the project: clean, readable Python implementation, no low-level optimizations.
The two main references that I used were:
- Arithmetic coding for data compression. Ian H. Witten, Radford M. Neal, and John G. Cleary. 1987. Commun. ACM 30, 6 (June 1987), 520–540. https://doi.org/10.1145/214762.214771
- The 2014 blog post Data Compression With Arithmetic Coding.
Other useful references to deep-dive into the material:
- github.com/nayuki/Reference-arithmetic-coding
- The book Information Theory, Inference and Learning Algorithms by David J. C. MacKay
- The Data Compression Book by Mark Nelson and Jean-loup Gailly