Sorting; low sensitivity

Question

Sorting; low sensitivity

KrzysztofMKozak opened this issue 9 years ago · 4 comments

Dear Heng,
Looks like a great tool and I tried to use it for a subset of the Anopheles 1000 Genomes data (https://www.malariagen.net/apps/ag1000g/phase1-AR3/index.html).
The first problem I encountered is at the sorting stage: once the first sam was made, fermi proceeded to sort it and generated ~150k tiny sam files - I had to kill it eventually.
The second issue: I ran clean up manually and genotyped with GATK, as GATK had been originally used to genotype after BWA-mapping the same reads to reference. When we compare the BWA and Fermikit genotypes, the Fermi pipeline results in ~18x fewer variants. What would be the key parameters that could help with this, please?
Best,
Chris

Answer 1 · 2015-10-12T13:37:52.000Z

Please use the command line in README for variant calling:

fermi.kit/run-calling -t16 bwa-indexed-ref.fa prefix.mag.gz | sh

GATK won't work well.

Answer 2 · 2015-10-12T13:45:39.000Z

Also to generate multi-sample vcf, first mapping and sort with fermi.kit/bwa mem -x intractg ref.fa unitigs.mag.gz && fermi.kit/htsbox samsort and then do calling with:

fermi.kit/htsbox pileup -cuf ref.fa pre1.srt.bam pre2.srt.bam > out.raw.vcf
fermi.kit/k8 fermi.kit/hapdip.js vcfsum -f out.raw.vcf > out.flt.vcf