1kgenomes-gvcfs

Overview

This repository contains scripts for analyzing low-coverage fastq files from Phase 3 of the 1000 Genomes project to produce gVCF files. The scripts are designed to run on Amazon EC2 using StarCluster. The scripts are run on a custom AMI which is described below. Raw fastq files from each sample are aligned with bwa, duplicates are marked with samblaster, alignments are sorted and indexed using sambamba and variants are called using the GATK. The /data directory contains an NFS shared EBS volume with the indexed reference genome (hs37d5.fa) and space for log files.

AMI Details

The AMI contains the following software:

Python3
- retrying
- awscli
- boto3
mdadm (software RAID)
bwa version 0.7.15
samblaster version 0.1.22
sambamba version 0.6.3
Java 8
The GATK jar file at /usr/local/bin/ version 3.5.0

DonFreed/1kgenomes-gvcfs

1kgenomes-gvcfs

Overview

AMI Details