/huffskew

An Optimization of the Huffman Compression Algorithm for Highly Skewed Alphabet Distributions

Primary LanguageC++MIT LicenseMIT

Huffskew

An Optimization of the Huffman Compression Algorithm for Highly Skewed Alphabet Distributions

Description

Huffskew is a variant of the Huffman compression algorithm that optimizes the algorithm for alphabet symbols with singular occurrences. Huffman compression relies on symbol frequency analysis to determine a mapping for variable-length coding. Due to the natural tendency toward uniformity for an increasing sample size, Huffman compression is generally used only for small to medium compression tasks, or otherwise done in smaller "chunks" for larger data, thus yielding a greater skew in the distribution to maximize use of its frequency-based variable-length coding. Nevertheless, even for highly skewed distributions, the "tail" end of the data remains unoptimized, and for certain symbols in the tail (specifically, those with singular/almost singular occurrences) the Huffman compression algorithm takes more space than the original symbol itself. Therefore, this optimization minimizes the unnecessary storage space for symbols with singular occurrences using an ordered replacement method.

Installation

Build Huffskew with Make:

make clean
make

Install/Uninstall:

sudo make install
sudo make uninstall

Usage

huffskew [action] [filename] -o [filename]

Option          Description
-h              Show this help screen
-c [filename]   Compress action
-u [filename]   Uncompress action
-o [filename]   Output filename

Developers

Ashish D'Souza - @computer-geek64

Versioning

This project uses the Git Version Control System (VCS).

License

This project is licensed under the MIT License.