extract_kraken_reads.py does not keep base quality score
BenjaminDelisle opened this issue · 2 comments
Hi,
I just realized that my fastq post-processing by extract_kraken_reads.py have lost their base quality score. I found this by validating my sam file with gatk ValidateSamFile command which gave me this warning in summary mode:
Error Type Count
WARNING:QUALITY_NOT_STORED 100
And also in VERBOSE mode:
WARNING::QUALITY_NOT_STORED:Record 1, Read name M04938:309:000000000-K2R7T:1:1101:14275:1163, QUAL field is set to * (unspecified quality scores), this is allowed by the SAM specification but many tools expect reads to include qualities
WARNING::QUALITY_NOT_STORED:Record 2, Read name M04938:309:000000000-K2R7T:1:1101:14275:1163, QUAL field is set to * (unspecified quality scores), this is allowed by the SAM specification but many tools expect reads to include qualities
And so on until the first hundred reads, but basically for all reads. I tested with fastq before passing them to the script and base score were present in the resulting sam file
Is there something that I might have missed or done wrong?
This seems very straight forward with both extract_kraken_reads.py and bwa meme aligner
python extract_kraken_reads.py -r /data/devel/mtb_wgs/Output_kraken2/$1/$1"_kraken2_report.txt" -k /data/devel/mtb_wgs/Output_kraken2/$1/$1"_krakenout.txt" -s1 /data/devel/mtb_wgs/Output_fastp/$1/$1"_trimmed-dedup.fq.gz" -s2 /data/devel/mtb_wgs/Output_fastp/$1/$2"_trimmed-dedup.fq.gz" -o /data/devel/mtb_wgs/Output_kraken2/$1/$1"_extracted.fastq" -o2 /data/devel/mtb_wgs/Output_kraken2/$1/$2"_extracted.fastq" --taxid 1763 --include-children
bwa mem H37Rv -R $(echo "@rg\tID:$id\tSM:$sm""$id\tLB:$sm""$id\tPL:ILLUMINA") $path_fastqKraken$1"_extracted.fastq.gz" $path_fastqKraken$2"_extracted.fastq.gz" > $path_Output_bwa_gatk$1"_bwa_mem_align_H37Rv.sam"
Thanks for the help and let me know if you need more information
Benjamin
Hi Benjamin,
I am not the developer for this software, but I believe you need to supply the --fastq-output
flag when running extract_kraken_reads.py
. You should get fastq files with quality scores in them as output when you do.
Best,
Dave
Thank you Dave, the flag fixed my problem!
Best,
Benjamin