Support xz-compressed files
joverlee521 opened this issue · 2 comments
Hello @lmdu,
I am planning to use pyfastx
within Nextstrain's Augur to support a new data curation command and it would be really helpful to be able to support xz-compressed files. Would you be open to extending pyfastx
to support xz-compressed files?
Groups working with large files are using xz
to save space because xz
has a better compression ratio than gzip
. For example, Nextstrain hosts a file of all GenBank SARS-CoV-2 genomes that is xz-compressed.
With the condition that the file was originally compressed in multiple short blocks, it is possible to randomly access xz-compressed files. python-xz is an example of this in pure Python and xz-random-access is an example of this in C.
Thank you!
Thank you! In the future, I will consider to add support for parsing xz compressed FASTA/Q files.
Great to hear @lmdu! We are slowly moving from xz
to zstd
due to faster compression/decompression at no compression ratio loss compared to xz
.
Just like xz random access, zstd random access seems to be possible as well. I've found these resources: