wdecoster/NanoPlot

ValueError: Invalid character in quality string

Closed this issue · 1 comments

I am running NanoPlot on 3 fastq files - the original fastq, fastq with its simplex reads and fastq with its duplex reads.
The Nanoplot crashes with the error "ValueError: Invalid character in quality string" for the original fastq but runs perfectly fine for simplex and duplex ones.
I checked the unaligned bam to fastq conversion step, to see if the generated fastq is truncated, but that log file did not show any error.
Also, the source bam used to generate everything is intact and not truncated.
Adding the Nanoplot log file-
2023-09-06 12:33:41,910 Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0]
2023-09-06 12:33:42,161 Nanoget: Starting to collect statistics from plain fastq file.
2023-09-06 12:33:42,162 Nanoget: Decompressing gzipped fastq /home/sofia/Mala_Quartet/BNG69/BNG69_pass.fastq.gz
2023-09-06 13:12:21,982 Invalid character in quality string
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk
return [fn(*args) for args in chunk]
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 205, in
return [fn(*args) for args in chunk]
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/extraction_functions.py", line 396, in process_fastq_plain
data=[res for res in extract_from_fastq(inputfastq) if res],
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/extraction_functions.py", line 396, in
data=[res for res in extract_from_fastq(inputfastq) if res],
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/extraction_functions.py", line 407, in extract_from_fastq
for rec in SeqIO.parse(fq, "fastq"):
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 72, in next
return next(self.records)
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/Bio/SeqIO/QualityIO.py", line 1134, in iterate
raise ValueError("Invalid character in quality string") from None
ValueError: Invalid character in quality string
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoplot/NanoPlot.py", line 61, in main
datadf = get_input(
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/nanoget.py", line 110, in get_input
dfs=[out for out in executor.map(extraction_function, files)],
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/nanoget.py", line 110, in
dfs=[out for out in executor.map(extraction_function, files)],
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 575, in _chain_from_iterable_of_lists
for element in iterable:
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
ValueError: Invalid character in quality string

Hi,

That error is raised by Biopython, which NanoPlot uses for parsing the fastq file. It is quite lenient regarding fastq formatting, but it doesn't seem to like your file :)

Do you think you could share it?

Wouter