The scripts in this repository can be used to implement the methods described in Larson et al. 2015 (https://www.ncbi.nlm.nih.gov/pubmed/26510457) for detecting SMA carriers. This technique utilizes both carrier probabilities and coverage at SMN1 loci to investigate SMA carrier status. (in beta)
Calculate coverge per gene and at three SMN loci that distinguish SMN1 from SMN2
- bam_list # file with one line per sample (tab delimited: absolute bam path and whether ice/agilent was used)
- GATKjar # location of GATK jar installation
- reference # path to reference hg37
- sma_intervals # smn loci interval file
- output_dir #name of output directory
- picard #location of picard jar installation
- scripts_dir #location of sma scripts
Merge SMN coverage results from all samples into one file
- bam_list # file with one line per sample (tab delimited: absolute bam path and whether ice/agilent was used)
- output_dir #name of output directory (keep consistent with previous scripts)
Calculate theta, di, ri, pi
- cov_directory #name of output directory (keep consistent with previous scripts)
- bam_files # file with one line per sample (tab delimited: absolute bam path and whether ice/agilent was used)
- interval_of_interest #specify if should run on ice or agilent
- datamash #path to datamash
Calculates the carrier probabilitiy and plots credible intervals
- the ice/agilent_sma_sample_stat.txt file
- output directory
GATK Picard datamash