/compression-challenge

A compression challenge that I bet nobody will solve.

compression-challenge

I have provided seven binary files in a secret format. These files are each several megabytes, summing to ~30MB.

$ du -k files/*
6000    files/file1
2932    files/file2
6000    files/file3
4208    files/file4
2836    files/file5
4504    files/file6
4884    files/file7
$ du -k files/
31368   files/

Your goal is to create a compression and decompression algorithm that achieves at least a ~9x reduction in size, including the size of the compression and decompression program.

$ mkdir compressed
$ for i in {1..7}; do ./solution compress -input files/file${i} -output compressed/file${i}; done
$ du -k compressed/*
32      compressed/file1
4       compressed/file2
32      compressed/file3
156     compressed/file4
16      compressed/file5
212     compressed/file6
232     compressed/file7
$ du -k compressed/
688     compressed/
$ du -k solution 
2304    solution

Here, the compressed data is now 0.7MB, and the compression program itself is 2.3MB. This totals to 2.99MB, which is less than 10x the size of our original data (~31MB).

The compression is required to be lossless. This means that we can now decompress this data into the original files, and they must be bit-for-bit equal.

$ mkdir decompressed
$ for i in {1..7}; do ./solution decompress -input compressed/file${i} -output decompressed/file${i}; done
$ sum decompressed/*
60340  5997 decompressed/file1
49613  2930 decompressed/file2
14037  5997 decompressed/file3
37567  4205 decompressed/file4
01491  2833 decompressed/file5
00669  4503 decompressed/file6
06482  4884 decompressed/file7
$ sum files/*
60340  5997 files/file1
49613  2930 files/file2
14037  5997 files/file3
37567  4205 files/file4
01491  2833 files/file5
00669  4503 files/file6
06482  4884 files/file7

Rules

  • You must achieve at least a 9x compression ratio to win.
  • You must include the size of your compressor + decompressor when computing the compression ratio. This needn't include the size of a language interpreter, but should include the size of standard libraries or other libraries that you use.
  • You may not store secret information in filenames or other metadata that is not counted in the compressed size.
  • Your compression algorithm must be lossless; the decompressed data must match the original data before compression, byte-for-byte.
  • You may not find your solution by hacking my computer or my online accounts!