/RagingMoose

Experimental Java LZFSE capable decompressor.

Primary LanguageJavaMIT LicenseMIT

RagingMoose

Experimental Java LZFSE capable decompressor.

What is it?

An experimental Java LZFSE capable decompressor created as a fallback solution for InflatableDonkey. The codebase has been designed from the ground up and barring constants/ tables and a few core routines, has little in the way of resemblance to the source material.

I've opted for simplicity and there's little in the way of optimisation. It's presumed that compression will not be supported.

Unit tests are in place with support for extended tests using tcgen and an external reference lzfse binary (see below). However the decompressor has not been battle tested and bugs may remain.

Should I use it?

The raison d'être of RagingMoose is it's ease of integration into Java projects without the use of external dependencies/ interfacing.

However I would strongly suggest using the reference lzfse compressor in some manner instead if at all possible:

  • Reference lzfse executable/ ProcessBuilder: A (convoluted) example using ProcessBuilder can be found here. It uses a simple utility class ProcessAssistant to handle concurrent streaming.

  • Reference lzfse library/ JNI: JNI example header and c code for the following call:

    public static native long decompress(ByteBuffer src, ByteBuffer dst);

The detailed mechanics of JNI are beyond the scope of this readme.

Benchmarks comparing RagingMoose and the reference lzfse via JNI/ ProcessBuilder are detailed below.

How do I use it?

Create an instance of LZFSEInputStream and consume/ close as an InputStream.

The native constructor accepts ReadableByteChannels.

    public LZFSEInputStream(ReadableByteChannel ch) {
        ...
    }

The InputStream constructor wraps over the native constructor using the Channels#newChannel adapter.

    public LZFSEInputStream(InputStream is) {
        this(Channels.newChannel(is));
    }

A simple example that decompresses and prints the contents of an LZFSE compressed text archive. LZFSEDecoderExceptions signify errors in the underlying data format.

    Path path = Paths.get("my.lzfse.compressed.text.file"); // your LZFSE compressed text file here
    
    try (LZFSEInputStream is = new LZFSEInputStream(Files.newByteChannel(path));
            ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
        int b;
        while ((b = is.read()) != -1) {
            baos.write(b);
        }
        System.out.println(baos.toString("UTF-8"));

    } catch (LZFSEDecoderException ex) {
        System.err.println("Bad LZFSE archive: " + path);

    } catch (IOException ex) {
        System.err.println("IOException: " + ex.toString());
    }

Benchmarks

Decompression benchmarks using JMH. The core benchmarking code is here. Tests are conducted on in-memory byte data. Use the figures as a rough guide only as your use case/ environment may differ significantly.

  • iOS 11 sqlitedb file. 460 bytes (16,384 bytes bytes uncompressed):
Benchmark                            Mode  Cnt      Score     Error  Units
LZFSEBenchmark.lzfseJNI             thrpt   20  32444.002 ± 300.831  ops/s
LZFSEBenchmark.lzfseProcessBuilder  thrpt   20   2086.615 ± 130.519  ops/s
LZFSEBenchmark.ragingMoose          thrpt   20  12571.402 ± 458.201  ops/s
  • J.R.R.Tolkien, The Hobbit. Chapter 1. 47,257 bytes (18,905 bytes compressed):
Benchmark                            Mode  Cnt     Score     Error  Units
LZFSEBenchmark.lzfseJNI             thrpt   20  5404.936 ± 379.371  ops/s
LZFSEBenchmark.lzfseProcessBuilder  thrpt   20  1900.292 ±   8.834  ops/s
LZFSEBenchmark.ragingMoose          thrpt   20  1631.971 ± 102.401  ops/s
  • Homer, The Iliad. Full. 808,298 bytes (297,947 bytes compressed):
Benchmark                            Mode  Cnt    Score    Error  Units
LZFSEBenchmark.lzfseJNI             thrpt   20  307.237 ± 26.004  ops/s
LZFSEBenchmark.lzfseProcessBuilder  thrpt   20  251.843 ±  1.500  ops/s
LZFSEBenchmark.ragingMoose          thrpt   20  116.270 ±  7.660  ops/s
  • Rust-lang, rust-master tarball. 42,270,720 bytes (6,813,253 bytes compressed):
Benchmark                            Mode  Cnt   Score   Error  Units
LZFSEBenchmark.lzfseJNI             thrpt   20  11.388 ± 0.220  ops/s
LZFSEBenchmark.lzfseProcessBuilder  thrpt   20   5.989 ± 0.166  ops/s
LZFSEBenchmark.ragingMoose          thrpt   20   4.583 ± 0.150  ops/s

Extended unit tests

LZFSEInputStreamTest#tcgenTest is by default set to @Ignore. It requires both the tcgen and lzfse binaries to be on the command path either as .exe binaries or their extensionless counterparts.

With the binaries in place and @Ignore deleted/ commented out an additional series of tests is performed. These generate test data using tcgen. This data is then compressed with the reference lzfse binary and in turn decompressed by RagingMoose. We expected the decompressed data to match the generated test data.

The data is cut to predetermined lengths to hit the various underlying block types (bvx-, bvxn, bvx2).

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.github.horrorho.ragingmoose.LZFSEInputStreamTest
Tests run: 32, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 14.196 sec - in com.github.horrorho.ragingmoose.LZFSEInputStreamTest

Results :

Tests run: 32, Failures: 0, Errors: 0, Skipped: 1

LZFSEInputStreamTest#tcgenTestExt is by default set to @Ignore. It's essentially a sanity test and bypasses RagingMoose entirely and compresses/ decompresses using lzfse. It's not suitable as a benchmark as the chokepoint is in tcgen test data generation.

What's with the name?

I'm not quite sure... It seemed like a good idea at the time.

Links

LZFSE - reference implementation.

Asymmetric_Numeral_Systems - bed time reading.

Finite State Entropy - A new breed of entropy coder - wonderful series of articles.

Compression test file generator - interesting thread post by Matt Mahoney author of tcgen. I've warped some of the examples into unit tests.