[FEATURE] support gzip'd input fastq files
nick-youngblut opened this issue · 0 comments
nick-youngblut commented
Is there an existing issue for this?
- I have searched the existing issues
Have you loaded the SQANTI3.env conda environment?
- I have loaded the SQANTI3.env conda environment
Problem description
Using a gzip'd fastq input file for sqanti3_qc.py
throws the following error:
Rscript (R) version 4.3.3 (2024-02-29)
Cleaning up isoform IDs...
Traceback (most recent call last):
File "/home/nickyoungblut/dev/bfx/SQANTI3-5.2.1/./sqanti3_qc.py", line 2525, in <module>
main()
File "/home/nickyoungblut/dev/bfx/SQANTI3-5.2.1/./sqanti3_qc.py", line 2445, in main
args.isoforms = rename_isoform_seqids(args.isoforms, args.force_id_ignore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nickyoungblut/dev/bfx/SQANTI3-5.2.1/./sqanti3_qc.py", line 2131, in rename_isoform_seqids
if h.readline().startswith('@'): type = 'fastq'
^^^^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Code sample
Instead of:
with open(input_fasta) as h:
if h.readline().startswith('@'): type = 'fastq'
...just update to:
if input_fasta.endswith('.gz'):
open_func = gzip.open
else:
open_func = open
with open_func(input_fasta) as h:
if h.readline().startswith('@'): type = 'fastq'
Error
No response
Anything else?
Given the size of read files, it would be quite helpful to allow for gzip'd input.
I'm quite surprised that gzip'd input is not supported.