r-lyeh-archived/bundle

to consider

r-lyeh-archived opened this issue · 9 comments

richox/zmolly

https://github.com/fusiyuan2010/CSC - compares to lzma (lzma25 < lzma20 < csc)
https://code.google.com/p/data-shrinker - compares to lz4 (lz4hc < lz4 < shrinker)

mavam commented

lz4hc < lz4 < shrinker

JFYI: According to my measurements with PCAP traces, shrinker trades slightly lower throughput for a small in improvement in compression ratio compared to LZ4. For ASCII data, I observe the opposite.

Yep it is somewhat in the middle of both LZ4s. Anyways, for only 200 lines of very portable code it's a pretty beast :)

http://www.byronknoll.com/cmix.html
zpaq style, improved ratios, too much memory hungry (>32 GiB)
tried to reduce memory requeriments and didnt work

mavam commented

zpaq style, improved ratios, too much memory hungry (>32 GiB)

Woah, what's the input size you're trying to compress? Are you sure there's not a leak somewhere? Such draconian memory requirements seem a bit off.

enwik8, the 100 MiB canonical file used widely for compression tests and benchmarks :)

mavam commented

No disrespect, but it strikes me as highly unpractical to keep in-memory state that's bigger than the input by over a factor of 300. Who can afford to use such a library?! Smells like a buggy or flawed implementation.

Well, this is the hardcore side of compression. Done for fun or for research purposes.
And totally prohibitive for regular users.

There are even compression contests like this: https://en.wikipedia.org/wiki/Hutter_Prize

I was curious to see what I could achieve by reducing cmix memory requeriments by 100 (and to see what kind of new (bigger) compression ratio I could get from it).

It didnt work in the end :/, so I have spent the day with other encodings :)

mavam commented

Ok, an academic exercise is always legit. 😄