Sorting; low sensitivity
KrzysztofMKozak opened this issue · 4 comments
Dear Heng,
Looks like a great tool and I tried to use it for a subset of the Anopheles 1000 Genomes data (https://www.malariagen.net/apps/ag1000g/phase1-AR3/index.html).
The first problem I encountered is at the sorting stage: once the first sam was made, fermi proceeded to sort it and generated ~150k tiny sam files - I had to kill it eventually.
The second issue: I ran clean up manually and genotyped with GATK, as GATK had been originally used to genotype after BWA-mapping the same reads to reference. When we compare the BWA and Fermikit genotypes, the Fermi pipeline results in ~18x fewer variants. What would be the key parameters that could help with this, please?
Best,
Chris
Please use the command line in README for variant calling:
fermi.kit/run-calling -t16 bwa-indexed-ref.fa prefix.mag.gz | sh
GATK won't work well.
Also to generate multi-sample vcf, first mapping and sort with fermi.kit/bwa mem -x intractg ref.fa unitigs.mag.gz && fermi.kit/htsbox samsort
and then do calling with:
fermi.kit/htsbox pileup -cuf ref.fa pre1.srt.bam pre2.srt.bam > out.raw.vcf
fermi.kit/k8 fermi.kit/hapdip.js vcfsum -f out.raw.vcf > out.flt.vcf
See also README.
Thank you! Just to clarify, GATK was used in order to compare the results of mapping raw reads and mapping unitigs, without having the extra confound of using different variant callers.
Unitigs have very different characteristics from short reads. Most existing tools won't play well with unitigs.