ahishsujay/SNP_Pipeline

Pipeline for variant / SNP calling

Shell

SNP_Pipeline

Pipeline for calling Single Nucleotide Polymorphisms (SNPs). The pipeline is written in bash.

Variant/SNP calling pipeline steps:

Align FASTQ reads to a reference genome to create an alignment file - Mapping step
Processing the alignment file (file format conversion, sorting, alignment improvement) - Improvement step
Calling the variants - Variant Calling step

Pipeline Requirements:

bwa for the alignment
samtools/HTS package for processing and calling variants
GATK for improving the alignment. You must use GATK v3.7.0, available on the Archived version page

Input command line options:

-a Input reads file – pair 1
-b Input reads file – pair 2
-r Reference genome file
-e Perform read re-alignment
-o Output VCF file name
-f Mills file location
-z Output VCF file should be gunzipped (*.vcf.gz)
-v Verbose mode; print each instruction/command to tell the user what your script is doing right now
-i Index your output BAM file (using samtools index)
-h Print usage information (how to run your script and the arguments it takes in) and exit

Required input files:

Input reads file - pair1
Input reads file - pair2
Reference genome file
Mills file

Execution of the bash script:

./snp_pipeline.bash -a <input reads file -pair1> -b <input reads file -pair2> -r -f -o

Output file:

VCF File