k8hertweck/Clostridium

improve variant calling pipeline

k8hertweck opened this issue · 0 comments

  • add step to remove PCR errors (after read mapping, before SNP calling), using Picard MarkDuplicates
  • filter out sites with extremely high depths of coverage (put vcf file into R, make distribution of depths, remove any sites that are 1.5 - 2x the median depth of coverage to get rid of the long tail)
  • separate SNPs from indels after variant calling using SelectVariants in GATK
  • accommodate possibility of false positive SNP calls around indels by using the mask option of FilterVariants with SelectVariants to remove SNPs within 6 bases of an indel