The fx-cmix is a updated implementation of fast-cmix.
Prize awarded on February 2, 2024. http://prize.hutter1.net/
This submission contains fallowing modifications on top of the recent fast-cmix Hutter Prize winner:
- paq8hp model is replaced with fxcmv1 model with fallowing notable additions:
- Multiple state tables are used in predictors, this allows better predictability.
- Most contexts are divided between 30 main predictors, this allows more efficient memory usage per context.
- Added bracket, quote, first char, char in paragraph, column, table, template, word stream/paragraph context. These contexts are parsed depending on input, this includes parsing of wiki links, http links, tables, columns, paragraphs, quotes, brackets, list, templates.
- Some contexts are swapped in predictors depending on what current input is (table, column mode, word/paragraph, list).
- Some predictors are switched on/off depending on last char, link or current bracket which improves compression.
- Predictions are mixed with context that are more aware of what predictors are outputting.
- Match model (not present in paq8hp model).
- Predictors are faster allowing more complex contexts.
- new dictionary
- small change in phda9 preprocessor and in two tables in cmix
- memory usage is larger in fxcmv1 model compared to old paq8hp
Below is the fx-cmix result:
Metric | Value |
---|---|
fx-cmix compressor's executable file size (S1) | 436707 bytes |
fx-cmix self-extracting archive size (S2) | 112148343 bytes |
Total size (S) | 112585050 bytes |
Previous record (L) | 114156155 bytes |
fx-cmix improvement (1 - S/L) | 1.38% |
Experiment platform | |
---|---|
Operating system | Ubuntu 20.04 |
Processor | Intel(R) Xeon(R) CPU @ 2.20GHz Geekbench score 706 |
Memory | 32 GB |
Decompression running time | 85 hours |
Decompression RAM max usage | 9575124 KiB |
Decompression disk usage | ~35GB |
Time, disk, and RAM usage are approximately symmetric for compression and decompression.
The installation and usage instructions for fx-cmix are the same as for fast-cmix.
One important note: it is recommended to change one variable in the source code for PPM. From line 26 in src/models/ppmd.cpp:
// If mmap_to_disk is set to false (recommended setting), PPM will only use RAM
// for memory.
// If mmap_to_disk is set to true, PPM memory will be saved to disk using mmap.
// This will reduce RAM usage, but will be slower as well. *Warning*: this will
// write a *lot* of data to disk, so can reduce the lifespan of SSDs. Not
// recommended for normal usage.
bool mmap_to_disk = true;
This variable is set to true by default, to comply with the Hutter Prize RAM limit.
Building fx-cmix compressor from sources requires clang-17, upx-ucl, and make packages. On Ubuntu, these packages can be installed by running the following scripts:
./install_tools/install_upx.sh
./install_tools/install_clang-17.sh
A bash script is provided for compiling cmix-hp compressor from sources on Ubuntu. This script places the cmix-hp executable file named as cmix
in ./run
directory. The script can be run as
./build_and_construct_comp.sh
To run the cmix-hp compressor use
cd ./run
cmix -e <PATH_TO_ENWIK9> enwik9.comp
The compressor is expected to output an executable file named archive9
in the same directory (./run
). The file archive9
when executed is expected to reproduce the original enwik9 as a file named enwik9_restored
. The executable file archive9
should be launched without argments from the directory containing it.
cd ./run
./archive9