/ctw

Context Tree Weighting

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Context Tree Weighting

Documentation

Package ctw provides an implementation of the Context Tree Weighting algorithm. Also contained is an implementation of the Rissanen-Langdon Arithmetic Coding algorithm, which is combined with Context Tree Weighting to create a lossless compression/decompression utility.

Below is an example of using this package to compress Lincoln's Gettysburg address:

go run compress/main.go gettysburg.txt > gettys.ctw
cat gettys.ctw | go run decompress/main.go > gettys.dctw
diff gettysburg.txt gettys.dctw

The results are noticeably superior to that of other commercial applications on a Mac OS X:

  • Original: 1463
  • tar.gz: 993
  • 7z: 908
  • zip: 874
  • xz: 828
  • CTW: 772

Reference: F.M.J. Willems and Tj. J. Tjalkens, Complexity Reduction of the Context-Tree Weighting Algorithm: A Study for KPN Research, Technical University of Eindhoven, EIDMA Report RS.97.01.

Full documentation at https://godoc.org/github.com/fumin/ctw.

Applications

A demonstration of using this compression tool to study mammalian evolution and construct the phylogeny of the SARS virus can be found here.

Testing

go test

Questions

  • Why does increasing the depth above 48 not improve the compression of gettysburg.txt? Depth 48 gives 772 bytes, while depth 60 also gives 772 bytes.
  • The exposition in https://cs.anu.edu.au/courses/comp4620/2015/slides-ctw.pdf gives a CTW based way of predicting the next bit. However, it is not clear how should we predict the next say 10 bits, without iterating through the 1024 different possibilities.