extract_kraken_reads.py does not keep base quality score

Question

extract_kraken_reads.py does not keep base quality score

BenjaminDelisle opened this issue 2 years ago · 2 comments

Hi,

I just realized that my fastq post-processing by extract_kraken_reads.py have lost their base quality score. I found this by validating my sam file with gatk ValidateSamFile command which gave me this warning in summary mode:
Error Type Count
WARNING:QUALITY_NOT_STORED 100
And also in VERBOSE mode:
WARNING::QUALITY_NOT_STORED:Record 1, Read name M04938:309:000000000-K2R7T:1:1101:14275:1163, QUAL field is set to * (unspecified quality scores), this is allowed by the SAM specification but many tools expect reads to include qualities
WARNING::QUALITY_NOT_STORED:Record 2, Read name M04938:309:000000000-K2R7T:1:1101:14275:1163, QUAL field is set to * (unspecified quality scores), this is allowed by the SAM specification but many tools expect reads to include qualities

And so on until the first hundred reads, but basically for all reads. I tested with fastq before passing them to the script and base score were present in the resulting sam file

Is there something that I might have missed or done wrong?

This seems very straight forward with both extract_kraken_reads.py and bwa meme aligner

python extract_kraken_reads.py -r /data/devel/mtb_wgs/Output_kraken2/$1/$1"_kraken2_report.txt" -k /data/devel/mtb_wgs/Output_kraken2/$1/$1"_krakenout.txt" -s1 /data/devel/mtb_wgs/Output_fastp/$1/$1"_trimmed-dedup.fq.gz" -s2 /data/devel/mtb_wgs/Output_fastp/$1/$2"_trimmed-dedup.fq.gz" -o /data/devel/mtb_wgs/Output_kraken2/$1/$1"_extracted.fastq" -o2 /data/devel/mtb_wgs/Output_kraken2/$1/$2"_extracted.fastq" --taxid 1763 --include-children

bwa mem H37Rv -R $(echo "@rg\tID:$id\tSM:$sm""$id\tLB:$sm""$id\tPL:ILLUMINA") $path_fastqKraken$1"_extracted.fastq.gz" $path_fastqKraken$2"_extracted.fastq.gz" > $path_Output_bwa_gatk$1"_bwa_mem_align_H37Rv.sam"

Thanks for the help and let me know if you need more information
Benjamin

Answer 1 · 2023-05-23T21:13:45.000Z

Hi Benjamin,

I am not the developer for this software, but I believe you need to supply the --fastq-output flag when running extract_kraken_reads.py. You should get fastq files with quality scores in them as output when you do.

Best,
Dave

Answer 2 · 2023-05-24T15:13:24.000Z

Thank you Dave, the flag fixed my problem!

Best,
Benjamin