Unicode error reading large fastq with index?
Closed this issue · 2 comments
schorlton commented
Got a unicode error while iterating a large fastq. Seems to reproduce on any large fastq.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in <listcomp>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 0: invalid continuation byte
Steps to reproduce:
# Warning: will download a 15gb fastq. Probably overkill but reproduces.
docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) ncbi/sra-tools fasterq-dump --progress SRR15035500
docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) mambaorg/micromamba bash -c 'micromamba install -y pyfastx==1.0.1 -c conda-forge -c bioconda -c defaults && python -c "import pyfastx; [print(read) for read in pyfastx.Fastq(\"SRR15035500.fastq\")]"'
Not sure if it's related to 6bfa15b / #39 / #56? Thank you as always for the great tool!
lmdu commented
Thank you! I have fixed it in the version 1.1.0.
schorlton commented
Thank you! 🙏