seq-lang/seq

Adding bgzip integration?

jelber2 opened this issue · 3 comments

Hi,

Not really an issue, more of a request?

bgzip can decompress in parallel (assuming the .gz file was compressed with bgzip). I don't know of many projects that take advantage of this, but first checking whether a .gz file was compressed with bgzip then calling the bgzip binary if present or perhaps a library somehow included with Seq (see for example something similar done with python https://pypi.org/project/bgzip/) might be very interesting. BBTools/BBMap (https://sourceforge.net/projects/bbmap/) can take advantage of systems where bgzip is installed, and I have seen quite a big performance increase when using bgzip on bgzipped files.

Thanks for the suggestion!

In case you need it urgently, you can probably dynamically load C bzgip library via cimport and then use the underlying C API.

Thanks! I might give that a try.

FYI, https://github.com/seq-lang/seq/blob/master/stdlib/core/file.seq contains a standard file / gzip implementation. We also use underlying C API directly there.