Nanopolish Variants Using Low Proportion Of Reads
seanmsjkim opened this issue · 0 comments
Hi,
I'm trying to use nanopolish to generate consensus sequences from bam files. I am finding that 'nanopolish variants' is using very few of the reads located by 'nanopolish index'. An example of the series of commands I am running is:
nanopolish index [fastq.gz] --sequencing-summary [sequencing_summary.txt] --directory [fast5_pass];
nanopolish variants -v --min-flanking-sequence 10 -x 1000000 --progress -t 4 --reads [fastq.gz] -o [vcf] -b [bam] --ploidy 2 -m 0.15 -g [ref_genome];
nanopolish vcf2fasta --skip-checks -g [ref_genome] [vcf] > [consensus_fasta];
From running these, I might output as follows:
[readdb] num reads: 18619, num reads with path to fast5: 18619
[post-run summary] total reads: 5, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 1, bad fast5: 0
[vcf2fasta] rewrote contig NR_115606.1 with 0 subs, 0 ins, 0 dels (0 skipped)
I'm sure nanopolish does some level of read QC/discarding, but I'm not sure what parameters are causing my reads to fail and be discarded. If I'm understanding correctly, only some 0.023% of my reads are acceptable in this case? Most samples are not this severe, but I'd love to know why this is happening and how I might tweak or play with the thresholds if possible!
Some properties of my data:
-v9.4 flowcells
-1600bp amplicon/read length
-Bacterial 16S
TIA,
Sean