/minimus-encoder

Data stream encoder for compressing sequences of floats/integer values to bits

Primary LanguageGoApache License 2.0Apache-2.0

minimus-encoder

Project description

Data stream encoder/decoder for compressing sequences of float/integer values to bits.

It focuses on encoding vectors of elements, i.e. interleaved series.

Heavily inspired by Facebook's Gorilla TSDB paper.

Shipped with a lossy float64 transform function, allowing more efficient (but lossy) storage. This module can still be used for lossless encoding.

Thanks icza for his bitio library, heavily used in this project.

Current status

The project is quite young and could definitely use more testing and eventually some optimizations. I am currently using it to store large amounts of fixed-interval time-series on the cloud.

There is currently an issue with values that rapidly oscillates around zero (i.e. flipping the float64's sign bit too often). In case you would like to use this project, it is best advised to apply a bias to your inputs in order to avoid flipping the sign bit too often.

Pull requests and suggestions are welcome, feel free to open an issue.

Example

See the example directory. The program compresses/uncompresses an arbitrary sequence of vectors with different loss levels. Finally, it prints out both inputs and outputs sequences, and displays the mean compressed data bits per sample.

go run ./example

        0: 60.4700  0.0163  1.0000  1.0000  => 60.3750  0.0156  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0000  0.0078  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.3750  0.0117  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0000  0.0156 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.0625  2.1250  3.2500  4.3750
|e|=1e-01:  6.999 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0162  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0499  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0699  0.0216 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-04: 11.838 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-07: 16.819 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-10: 21.762 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-13: 26.543 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-16: 29.764 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
(lossless)    30.078 b/sample