bwt_compressor is a lossless compressor/decompressor based on Burrows–Wheeler transform (BWT). It can be used as a CLI-tool or as a Python library.
The compression of data involves three major steps:
Warning: This project is for educational purposes only. It is written in Python and hasn't been optimized for speed and memory consumption.
- Python (tested on 3.9)
numpy
pydivsufsort
pytest
(for tests only)bitarray
(for tests only)
- Get the source code:
$ git clone https://github.com/r4victor/bwt_compressor && cd bwt_compressor
- Install the requirements:
$ python -m pip install -r requirements.txt
- Check that everything is ok by running tests:
$ python -m pytest tests/
If you have problems installing the pydivsufsort
library with pip
, consider installing it from the source:
- Get the source:
$ git clone https://github.com/louisabraham/pydivsufsort
- Install from the source:
$ python -m pip install pydivsufsort/.
The program reads the input data from stdin and outputs the result of the compression to stdout. Here's how you may use it:
$ cat resources/martin_eden.txt | python -m bwt_compressor > resources/martin_eden.bwt
To decompress the data, specify the -d
option:
$ cat resources/martin_eden.bwt | python -m bwt_compressor -d > resources/martin_eden_decompressed.txt
At this moment the compressor works only with ASCII-texts that do not contain the null byte (\x00
). This limitation can be lifted in the future.