yfukasawa/LongQC

longqc

Opened this issue · 7 comments

while running longqc I had this error can someone tell me what the problem is
ValueError: truncated quality string in [my path to the fastq file]

Hi,
you also raised an issue in #6, correct?

Did you run runqc subcommand? Can you provide the command you execute?
runqc subcommand will be soon obsoleted for PacBio (PacBio will stop providing scraps.bam, which is a requirement for runqc subcommand).
So, if this is the case, I recommend to run sampleqc subcommand for your fastq file with a proper choice of profile (for -x option).

Yoshinori

I can't find any attachment??

maybe because I replied in email.
well, I will send in the following comment all the text that contains the log and the command I used.

(python3) soumayachaikhi@MacBook-de-Soumaya LongQC % python longQC.py sampleqc -x ont-rapid -o ../assemble_quality ../3_2_GB.fastq

longQC:2022-06-21 16:41:05,726:169:INFO:Cmd: longQC.py sampleqc -x ont-rapid -o ../assemble_quality ../3_2_GB.fastq
longQC:2022-06-21 16:41:05,726:233:INFO:Preset "ont-rapid" was applied. Options --pb(--ont) is overwritten.
longQC:2022-06-21 16:41:07,766:306:INFO:Computation of the low complexity region started for a chunk 0
lq_mask:2022-06-21 16:41:09,427:111:INFO:New job was submitted: in->../assemble_quality/analysis/tmp_0.fastq, out->../assemble_quality/analysis/tmp_0.txt
longQC:2022-06-21 16:41:09,435:311:INFO:Adapter search is starting for a chunk 0.
longQC:2022-06-21 16:41:09,436:327:INFO:Computation of the GC fraction started for a chunk 0
lq_utils:2022-06-21 16:41:21,436:380:INFO:list for subsample is not initialized. Initializing now.
lq_adapt:2022-06-21 16:41:22,948:77:INFO:9744 reads were skipped due to their short lengths.
lq_adapt:2022-06-21 16:41:22,949:97:INFO:Adapter Sequence: GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA, max identity:0.905660 and the number of trimmed reads: 436
longQC:2022-06-21 16:41:28,389:338:INFO:Adapter search has done for a chunk 0.
longQC:2022-06-21 16:41:28,390:342:INFO:subsample finished for chunk 0.
longQC:2022-06-21 16:41:29,927:306:INFO:Computation of the low complexity region started for a chunk 1
lq_mask:2022-06-21 16:41:33,241:111:INFO:New job was submitted: in->../assemble_quality/analysis/tmp_1.fastq, out->../assemble_quality/analysis/tmp_1.txt
longQC:2022-06-21 16:41:33,246:311:INFO:Adapter search is starting for a chunk 1.
longQC:2022-06-21 16:41:33,246:327:INFO:Computation of the GC fraction started for a chunk 1
lq_adapt:2022-06-21 16:41:45,053:77:INFO:4534 reads were skipped due to their short lengths.
lq_adapt:2022-06-21 16:41:45,056:97:INFO:Adapter Sequence: GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA, max identity:0.905660 and the number of trimmed reads: 516
longQC:2022-06-21 16:41:51,539:338:INFO:Adapter search has done for a chunk 1.
longQC:2022-06-21 16:41:51,558:342:INFO:subsample finished for chunk 1.
longQC:2022-06-21 16:41:53,345:306:INFO:Computation of the low complexity region started for a chunk 2
lq_mask:2022-06-21 16:41:57,141:111:INFO:New job was submitted: in->../assemble_quality/analysis/tmp_2.fastq, out->../assemble_quality/analysis/tmp_2.txt
longQC:2022-06-21 16:41:57,141:311:INFO:Adapter search is starting for a chunk 2.
longQC:2022-06-21 16:41:57,142:327:INFO:Computation of the GC fraction started for a chunk 2
lq_adapt:2022-06-21 16:42:09,077:77:INFO:4823 reads were skipped due to their short lengths.
lq_adapt:2022-06-21 16:42:09,078:97:INFO:Adapter Sequence: GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA, max identity:0.920000 and the number of trimmed reads: 719
longQC:2022-06-21 16:42:15,219:338:INFO:Adapter search has done for a chunk 2.
longQC:2022-06-21 16:42:15,237:342:INFO:subsample finished for chunk 2.
longQC:2022-06-21 16:42:16,765:306:INFO:Computation of the low complexity region started for a chunk 3
lq_mask:2022-06-21 16:42:20,208:111:INFO:New job was submitted: in->../assemble_quality/analysis/tmp_3.fastq, out->../assemble_quality/analysis/tmp_3.txt
longQC:2022-06-21 16:42:20,208:311:INFO:Adapter search is starting for a chunk 3.
longQC:2022-06-21 16:42:20,209:327:INFO:Computation of the GC fraction started for a chunk 3
lq_adapt:2022-06-21 16:42:31,980:77:INFO:4626 reads were skipped due to their short lengths.
lq_adapt:2022-06-21 16:42:31,982:97:INFO:Adapter Sequence: GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA, max identity:0.920000 and the number of trimmed reads: 578
longQC:2022-06-21 16:42:38,062:338:INFO:Adapter search has done for a chunk 3.
longQC:2022-06-21 16:42:38,085:342:INFO:subsample finished for chunk 3.
Traceback (most recent call last):
File "longQC.py", line 956, in
main(args)
File "longQC.py", line 62, in main
args.handler(args)
File "longQC.py", line 299, in command_sample
for (reads, n_seqs, n_bases) in open_seq_chunk(args.input, file_format_code, chunk_size=args.mem*1024**3, is_upper=True):
File "/Users/soumayachaikhi/Bioinfo/Assemblyproject/LongQC/lq_utils.py", line 65, in open_seq_chunk
yield from parse_fastx_chunk(fn, chunk_size, is_upper=is_upper)
File "/Users/soumayachaikhi/Bioinfo/Assemblyproject/LongQC/lq_utils.py", line 269, in parse_fastx_chunk
for e in f:
File "pysam/libcfaidx.pyx", line 653, in pysam.libcfaidx.FastxFile.next
ValueError: truncated quality string in ../3_2_GB.fastq

I also had the same issue!

Wondering if there are any insights into this? Was testing a pipeline on some online data (SRR15206231).

Should be noted that this pipeline worked on another dataset - SRS17583785.

Looks like it's related to a chunking process - could this be due to some kind of memory limitation? The hifi reads comprise 102.2Gbases (~60Gbytes). The previous test was 10x smaller.

(running in nextflow, containerised)

Redacted@Redacted:~/HDD/test_run$ nextflow run asm_pipeline.nf -with-report -with-trace -with-timeline -with-dag dag.png --accession_id SRR15206231
N E X T F L O W ~ version 23.04.4
Launching asm_pipeline.nf [elegant_hypatia] DSL2 - revision: d62de36e54
executor > local (4)
executor > local (4)
[- ] process > FASTQC (FASTQC on SRR15206231) -
[56/ffa7bb] process > LONGQC (LONGQC on SRR15206231) [100%] 1 of 1, failed: 1 ✘
[- ] process > NANOPLOT (NANOPLOT on SRR15206231) -
[- ] process > HIFIADAPT (HIFIADAPT on SRR15206231) -
[- ] process > HIFIASM -
ERROR ~ Error executing process > 'LONGQC (LONGQC on SRR15206231)'

Caused by:
Process LONGQC (LONGQC on SRR15206231) terminated with an error exit status (1)

Command executed:

/opt/LongQC/longQC.py sampleqc --index 400M --ncpu 8 -m 2 -x pb-hifi -o longqc_SRR15206231_output SRR15206231_subreads.fastq.gz

Command exit status:
1

Command output:
(empty)

Command error:
longQC:2023-10-03 16:32:45,888:170:INFO:Cmd: /opt/LongQC/longQC.py sampleqc --index 400M --ncpu 8 -m 2 -x pb-hifi -o longqc_SRR15206231_output SRR15206231_subreads.fastq.gz
longQC:2023-10-03 16:32:45,888:234:INFO:Preset "pb-hifi" was applied. Options --pb(--ont) is overwritten.
Traceback (most recent call last):
File "/opt/LongQC/longQC.py", line 957, in
main(args)
File "/opt/LongQC/longQC.py", line 63, in main
args.handler(args)
File "/opt/LongQC/longQC.py", line 300, in command_sample
for (reads, n_seqs, n_bases) in open_seq_chunk(args.input, file_format_code, chunk_size=args.mem*1024**3, is_upper=True):
File "/opt/LongQC/lq_utils.py", line 65, in open_seq_chunk
yield from parse_fastx_chunk(fn, chunk_size, is_upper=is_upper)
File "/opt/LongQC/lq_utils.py", line 269, in parse_fastx_chunk
for e in f:
File "pysam/libcfaidx.pyx", line 651, in pysam.libcfaidx.FastxFile.next
ValueError: truncated quality string in SRR15206231_subreads.fastq.gz

Work dir:
/media/Redacted/Redacted/Redacted/test_run/work/56/ffa7bb44ea6c6e9ca147d0f918b7f0

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details

I did four things at once and the problem was fixed. Anyone with the same issue could try any of them, but some are specific to nextflow/pipelines.

  1. I changed my LongQC -x argument from pb-hifi to pb-sequel
  2. I used a local file rather than pulling directly from SRA ('fromSRA' channel in nextflow)
  3. I increased threads (--npcu) from 8 to 24
  4. I ran my pipeline serially, as in, LongQC was the only major process running on the machine

Hope this helps!