EXOME Sequencing pipeline - germline only

Python based command execution of exome sequencing analysis on the stanford genomics cluster

input example :

python exome_file_command.py XXXXX_merged.bam (this pipeline accepts only BWA aligned bam file)

STEPS - follows the GATK best practices

SORT the bwa aligned file - tool used is picard - function is SortSam
Reorder the SORTED bam file using hg19 coordinates - picard function is ReorderSam
Mark duplicates in the Reordered bam - picard function is MarkDuplicates
Build the bam index of dedup bam - picard function is BuildBamIndex
Base Recalibration of the dedup bam file - GATK function is BaseRecalibrator
Output the calibrated reads - GATK function is PrintReads
call the genotypes directly or to g.vcf file if many >30 samples - GATK function is HaplotypeCaller
use scripts VQSR_S1 to VQSR_S4 for variant filtration using GATK bext practices
Perform Variant evaluation - expected Ti/Tv ratio for whole exome > 2.5

Dependancies required picard-tools/2.14 gatk/3.7 hg19.fasta, Mills_and_1000G_gold_standard.indels.hg19.sites.vcf, db138 resource bundle from GATK best practices pipeline

ONLY TO BE USED ON A SUN GRID ENGINE JOB SUBMISSION CLUSTER with QSUB

adiamb/EXOME_SEQ_PIPELINE

EXOME Sequencing pipeline - germline only

input example :

STEPS - follows the GATK best practices