SNP_Pipeline
Pipeline for calling Single Nucleotide Polymorphisms (SNPs). The pipeline is written in bash.
Variant/SNP calling pipeline steps:
- Align FASTQ reads to a reference genome to create an alignment file - Mapping step
- Processing the alignment file (file format conversion, sorting, alignment improvement) - Improvement step
- Calling the variants - Variant Calling step
Pipeline Requirements:
- bwa for the alignment
- samtools/HTS package for processing and calling variants
- GATK for improving the alignment. You must use GATK v3.7.0, available on the Archived version page
Input command line options:
- -a Input reads file – pair 1
- -b Input reads file – pair 2
- -r Reference genome file
- -e Perform read re-alignment
- -o Output VCF file name
- -f Mills file location
- -z Output VCF file should be gunzipped (*.vcf.gz)
- -v Verbose mode; print each instruction/command to tell the user what your script is doing right now
- -i Index your output BAM file (using samtools index)
- -h Print usage information (how to run your script and the arguments it takes in) and exit
Required input files:
- Input reads file - pair1
- Input reads file - pair2
- Reference genome file
- Mills file
Execution of the bash script:
./snp_pipeline.bash -a <input reads file -pair1> -b <input reads file -pair2> -r -f -o
Output file:
VCF File