/RabbitZ

Our aim: A streaming gzip decompressing library for biological sequences

Primary LanguageC++MIT LicenseMIT

RabbitZ

A streaming gzip decompressing library.

Project is under development, now it is a stable version of pugz.

Getting Started

A Linux system on a recent x86_64 CPU is required.

Installing

Type:

make

For maximal performance, disable assertions with:

make asserts=0

Usage

./gunzip -t 8 file.gz

Counting lines is incredibly faster, because there is no thread synchronization:

./gunzip -l -t 8 file.gz

Test

We provide a small example:

cd example
bash test.sh

Our contribution

We found that pugz has a high probability of errors when processing files generated by pigz (especially with high compression levels). By understanding the ideas in the paper and analyzing the code, we found the reason for the error and fixed it:

As mentioned in part VI-A of the paper, threadi detects the start position of the DEFLATE block by enumeration, and then it passes this information to threadi-1 as the end position of the block. However, this part is not blocking in the code implementation, i.e., it is possible that threadi-1 has finished its work before it receives the position information passed by thread i, which leads to an error.

TODO

  • Fix get context error
  • Create a streaming API
  • Speed up pugz and improve thread utilization
  • Support blocked/multipart files