Scripts used for the recessive association analysis, within the G&H Flagship paper. Prior to this, we phased our WES dataset using a Snakemake pipeline developed here.
This repository contains:
get_biallelic_carriers.sh
to identify biallelic carriers, given phased genotypes. Also, see here for a more involved script that starts from the phased genotypes and creates VCF files ready for association.get_ch_homz_counts.sh
generates a tally with the numbers of compound-het and homozygous genotypes.concat_vcfs.sh
concatenates chromosome-based VCFs to one, for a given consequence, and creates a BGEN for more efficient association testing.run_regenie_{burden/sv/perm}.sh
are scripts to invoke REGENIE for gene-burden testing, variant-level, or permutation (ie FDR analysis), respectively. Each script has several modules, e.g. step1 (when needed) or step2, for quant or binary traits, which are selected based on the arguments, e.g.bash run_regenie.sh step2qt pLOF
will run step2 for quant traits and test for the pLOF burdens.dominance_deviation_{linear/logistic}.R
: R scripts to test for dominance deviation, for a candidate gene-trait pair. This requires a table with ADD/DOM/REC encodes, per gene, which can be obtained withdominance_deviation_prep.py
.
Georgios Kalantzis, gk18@sanger.ac.uk
August 2024