libz77 implementation of LZ77

This is prototype project. The samplestarget folder (created during the trasnspile stage of the build) provides some file I/O examples and some string compression examples.

synopsis

This is my implementation of the LZ77 compression algorithm described by Yaakov Ziv and Abraham Lempel in thier 1977 paper. LZ77 utilises a structure often described as a sliding window. I like to view the sliding window as a variation of the Cursor (Iterator in Design Patterns: Elements of Reusable Object-Oriented Software) pattern, where instead of describing a single index, the current position describes a range of elements and the traversal rate has no connection to the cursor size. The sliding window designates the bottom half of the cursor as the dictionary and the top half as the input stream.

The compression process produces a series of compressed frames, each one describing a single token and a pointer onto a range in the dictionary, relative to the current cursor position.

Building

$ npm install

ibz7 ‹master*› % npm run build

> libz7@0.1.0 prebuild /Users/burtol86/lisa-workspace/libz7
> babel src --out-dir lib --source-maps && babel samples --out-dir samplestarget --source-maps

Successfully compiled 8 files with Babel.
Successfully compiled 1 file with Babel.

Unit tests

> libz7@0.1.0 prebuild /Users/burtol86/lisa-workspace/libz7
> babel src --out-dir lib --source-maps && babel samples --out-dir samplestarget --source-maps

Successfully compiled 15 files with Babel.
Successfully compiled 6 files with Babel.

> libz7@0.1.0 build /Users/burtol86/lisa-workspace/libz7
> npm run test


> libz7@0.1.0 test /Users/burtol86/lisa-workspace/libz7
> jest --coverage

 PASS  __tests__/decompressor-transformer.test.spec.js
 PASS  __tests__/compressor-transformer.test.spec.js
 PASS  __tests__/serialization/packet-header-from-binary.test.spec.js
 PASS  __tests__/serialization/packet-from-binary.test.spec.js
 PASS  __tests__/locate-token.test.spec.js
 PASS  __tests__/consume-input.test.spec.js
 PASS  __tests__/serialization/packet-to-binary.test.spec.js
 PASS  __tests__/find-index-of-subarray.test.spec.js
 PASS  __tests__/sliding-window.test.spec.js
 PASS  __tests__/serialization/packet-header-generator.test.spec.js
 PASS  __tests__/serialization/unpack-integer-to-byte.test.spec.js
 PASS  __tests__/serialization/packet-header-to-binary.test.spec.js
 PASS  __tests__/config/config.test.spec.js
-------------------------------|----------|----------|----------|----------|-------------------|
File                           |  % Stmts | % Branch |  % Funcs |  % Lines | Uncovered Line #s |
-------------------------------|----------|----------|----------|----------|-------------------|
All files                      |      100 |    99.16 |      100 |      100 |                   |
 src                           |      100 |    98.21 |      100 |      100 |                   |
  compressor-transformer.js    |      100 |      100 |      100 |      100 |                   |
  consume-input.js             |      100 |      100 |      100 |      100 |                   |
  decompressor-transformer.js  |      100 |    91.67 |      100 |      100 |                75 |
  find-index-of-subarray.js    |      100 |      100 |      100 |      100 |                   |
  locate-token.js              |      100 |      100 |      100 |      100 |                   |
  sliding-window.js            |      100 |      100 |      100 |      100 |                   |
 src/config                    |      100 |      100 |      100 |      100 |                   |
  config.js                    |      100 |      100 |      100 |      100 |                   |
 src/serialization             |      100 |      100 |      100 |      100 |                   |
  header-flags-enum.js         |      100 |      100 |      100 |      100 |                   |
  packet-from-binary.js        |      100 |      100 |      100 |      100 |                   |
  packet-header-from-binary.js |      100 |      100 |      100 |      100 |                   |
  packet-header-generator.js   |      100 |      100 |      100 |      100 |                   |
  packet-header-to-binary.js   |      100 |      100 |      100 |      100 |                   |
  packet-to-binary.js          |      100 |      100 |      100 |      100 |                   |
  unpack-integer-to-byte.js    |      100 |      100 |      100 |      100 |                   |
-------------------------------|----------|----------|----------|----------|-------------------|

Test Suites: 13 passed, 13 total
Tests:       134 passed, 134 total
Snapshots:   0 total
Time:        1.618s
Ran all test suites.

Serialisation format

Field	Bytes	Use
1	1 byte	Packet mode modifier switches
2	1 byte	Packet byte width
3	1 byte	Packet token
4	2 bytes	Prefix start
5	2 bytes	Prefix end

Field 1
An 8-bit bit field of packet mode modifier switches.

Index	Meaning
0	Future
1	Add an additional byte to the size field
2	Add an additional byte to the size field (partially implemented)
3	Add an additional byte to the size field (partially implemented)
4	Pure packet mode, a special packed format for small packets (partially implemented)
5	Future
6	Future
7	Packet has a prefix field (default is 2 16-bit fields)

Field 2
An 8-bit field representing the packet size.

Field 3
An 8-bit value representing the token

Field 4
(Only required for a packets with the prefix bit field (7) enabled)
Indicates the start index of a prefix value in the history window.

Field 5
(Only required for a packets with the prefix bit field (7) enabled)
Indicates the length of the prefix value in the history window.

Examples

The ./sampletarget folder contains small demonstration scripts which demonstrate interaction with the compress and inflate methods.

Integration test (integrationtest.sh)

I compressed the devil outta ./resources/testinput01.txt

    Input size: 1408
    Ouput size: 116
    IO   ratio: 0.08238636363636363

        0.15 real         0.12 user         0.03 sys
0.13
I inflated the devil outta ./resources/testinput01.txt.bzz
0.09

afc36de9b6fa04d767b3fd3823507d76f1ef86c2  ./resources/testinput01.txt
afc36de9b6fa04d767b3fd3823507d76f1ef86c2  ./resources/testinput01.txt.bzz.inflate

--

I compressed the devil outta ./resources/testinput02.txt

    Input size: 11380
    Ouput size: 159
    IO   ratio: 0.01397188049209139

        0.18 real         0.16 user         0.03 sys
0.17
I inflated the devil outta ./resources/testinput02.txt.bzz
0.07

014c2644798763fe3ed176602addfe7b7edf1b6a  ./resources/testinput02.txt
014c2644798763fe3ed176602addfe7b7edf1b6a  ./resources/testinput02.txt.bzz.inflate

--

I compressed the devil outta ./resources/blue.jpg

    Input size: 65879
    Ouput size: 180250
    IO   ratio: 2.736076746763005

       19.93 real        19.79 user         1.65 sys
19.80
I inflated the devil outta ./resources/blue.jpg.bzz
1.93

15566f7c74f6db40da040312100d89345beebdc8  ./resources/blue.jpg
15566f7c74f6db40da040312100d89345beebdc8  ./resources/blue.jpg.bzz.inflate

--

I compressed the devil outta ./resources/sample-ppp.pptx

    Input size: 47372
    Ouput size: 98329
    IO   ratio: 2.0756776154690533

       10.72 real        10.79 user         0.61 sys
10.80
I inflated the devil outta ./resources/sample-ppp.pptx.bzz
0.90

955b6d57c0ffa8ba129d01abbf91988e298a8445  ./resources/sample-ppp.pptx
955b6d57c0ffa8ba129d01abbf91988e298a8445  ./resources/sample-ppp.pptx.bzz.inflate

--

I compressed the devil outta ./resources/sails.bmp

    Input size: 394294
    Ouput size: 688269
    IO   ratio: 1.745573100275429

       80.38 real        77.68 user        11.49 sys
77.69
I inflated the devil outta ./resources/sails.bmp.bzz
20.55

65fb675d23b2dd658e4f43f143988579e76fe515  ./resources/sails.bmp
65fb675d23b2dd658e4f43f143988579e76fe515  ./resources/sails.bmp.bzz.inflate

runcompress.js

/libz7 ‹master*› % samplestarget/runcompress.js ilovematthewromanoilovematthewromanoilovematthewromanoilovematthewromanoilovematthewroma
noilovematthew
<Buffer 00 08 69 6c 6f 76 65 6d 61 74>
<Buffer 01 03 68 01 01>
<Buffer 01 03 77 06 01>
<Buffer 00 01 72>
<Buffer 01 03 6d 0b 01>
<Buffer 01 03 6e 09 01>
<Buffer 01 03 69 04 01>
<Buffer 01 03 6c 01 12>
<Buffer 01 03 6f 01 24>
<Buffer 01 03 77 0b 1a>
📥         input : ilovematthewromanoilovematthewromanoilovematthewromanoilovematthewromanoilovematthewromanoilovematthew
💤    compressed :ilovemathwrm
                             n	ilo$w

🙌         ratio : 51.9607843137255%

📥         input :ilovemathwrm
                             n	ilo$w

💤  decompressed : ilovematthewromanoilovematthewromanoilovematthewromanoilovematthewromanoilovematthewromanoilovematthew
🙌         ratio : 192.45283018867926%

filecompress.js

/libz7 ‹master*› % samplestarget/filecompress.js resources/testinput01.txt
I compressed the devil outta resources/testinput01.txt
        0.18 real         0.13 user         0.05 sys

fileinflate.js

samplestarget/fileinflate.js resources/testinput01.txt.bzz
I inflated the devil outta resources/testinput01.txt.bzz

bzz-decompress-debug.js

This is a quick and dirty convenience tool. It parses a .bzz binary format file (generated by filecompress.js), deserializes the packet, and it outputs each compression packet in the order they would be parsed. It also attaches context state, such contents of the dictionary

/libz7 ‹master*› % ./samplestarget/bzz-decompress-debug.js resources/testinput01.txt.bzz
╔═══════════════════════════╤═══════════════════════════╤═══════════════════════════╤═══════════════════════════╤═══════════════════════════╗
║ Token field               │ Prefix field              │ Packet header             │ Packet raw source buffer  │ History buffer            ║
╚═══════════════════════════╧═══════════════════════════╧═══════════════════════════╧═══════════════════════════╧═══════════════════════════╝
							Press Enter to advance.

╟───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────╢
║ 74686520717569636b        │ N/A                       │ size=9, prefix=N/A prefix │ 74686520717569636b        │                           ║
║                           │                           │ ByteExtOne=N/A            │                           │                           ║
╟───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────╢
║ 62                        │ 6,1                       │ size=3, prefix=true prefi │ 620601                    │ 74,68,65,20,71,75,69,63,6 ║
║                           │                           │ xByteExtOne=N/A           │                           │ b                         ║
╚═══════════════════════════╧═══════════════════════════╧═══════════════════════════╧═══════════════════════════╧═══════════════════════════╝
							Press Enter to advance.

╟───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────╢
║ 726f776e                  │ N/A                       │ size=4, prefix=N/A prefix │ 726f776e                  │ 74,68,65,20,71,75,69,63,6 ║
║                           │                           │ ByteExtOne=N/A            │                           │ b,20,62                   ║
╟───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────╢
║ 66                        │ 6,1                       │ size=3, prefix=true prefi │ 660601                    │ 74,68,65,20,71,75,69,63,6 ║
║                           │                           │ xByteExtOne=N/A           │                           │ b,20,62,72,6f,77,6e       ║
╟───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────╢
║ 78                        │ 5,1                       │ size=3, prefix=true prefi │ 780501                    │ 74,68,65,20,71,75,69,63,6 ║
║                           │                           │ xByteExtOne=N/A           │                           │ b,20,62,72,6f,77,6e,20,66 ║
╚═══════════════════════════╧═══════════════════════════╧═══════════════════════════╧═══════════════════════════╧═══════════════════════════╝
							Press Enter to advance.

Observations

LZ77 is a very old compression algorithm. It isn't the most optimal, but it's the spark for a whole generation of compression systems and a very nice one to look at for a side project. I thought I'd list some observations I found interesting.

Compressing jpeg files with this implementation of LZ77 actually make it larger than the original file! It makes sense when you think about it. The sliding window is equivilent to a look up table (LZ78 showed this, citations when I have the spoons), so the thing making it compress like it does is Run Length Encodings (RLEs). The overhead for each compression packet will always be larger than the token assigned to the packet, so you only start to get returns once the prefix entries reference RLEs larger than storage overhead for each packet. I calculate the overhead for an 8 bit token to amount to a minimum overhead 48 bits, which means the prefix must be above 4 characters to actually give a reduction in storage requirements. Calculations are based on the assumption that each compression packet is stored as "T,S,E" where T is the token (8 bits), S (8 bits) is the prefix start index in the current sliding window frame and E (8 bits) is end index offset from S of the prefix. Those three pieces of data give 24 bits, the other 24 bits are the comma seperators and the terminator for each packet (newline, space etc). Anyway, the point I'm making is that JPEGs use much much more sophisticated compression strategies (plural!) in comparison to LZ77 which means the compressed output contains virtually no RLE blocks. I don't know a great deal about JPEG so far, but what I've read is exciting and I hope I can start taking those algorithms apart too.

spacekitcat/prototype-libz77