/evachan.org-Rscripts

Handy genetics-related R scripts

Primary LanguageRGNU General Public License v2.0GPL-2.0

Handy R functions for genetics research

Originally hosted at http://evachan.org/rscripts.html, these R functions were initially written for my own research. Throughout the years, I've updated them (and fixed bugs) based on suggestions from users. If you find these useful in your own research, please cite this git repository. If you spot bugs or have suggestions for improvement, please let me know. Or, better, submit a pull request :)

[Statistical Functions] (https://github.com/ekfchan/evachan.org-Rscripts#statistical-functions)
[Plotting Functions] (https://github.com/ekfchan/evachan.org-Rscripts#plotting-functions)
[Example Data] (https://github.com/ekfchan/evachan.org-Rscripts/blob/master/README.md#example-data-and-usage)

###The geno object

The geno object (see exampleI.R) on which all analysis depend was originally written for diploid data. All R functions in the repository should be applicable to diploid data. In some cases, should also be applicable to multi-allelic data. In fact, when these functions were written, they were geared towards genotyping array data. Keep that in mind when using these scripts.

###Statistical Functions

geno_to_allelecnt.R
A function to convert biallelic unphased SNP genotypes, such as {AA,CC,GG,TT,AC,AG,AT,CG,CT,GT}, to number of copies/counts {0,1,2} of the reference (or arbitrary) allele.
[See example II and simgeno.R for example and usage.]
simgeno.R
Very simple function to generate a biallelic unphased SNP genotype matrix in the format {AA,CC,GG,TT,AC,AG,AT,CG,CT,GT}. Used predominantly to test geno_to_allelecnt.R.
[See example II for usage and purpose.]
calc_EHHS.R
A function to calculate the normalised homozygosity between the i-th and j-th loci, EHHS(geno)i,j, for a given chromosome / linkage group (Tang, Thornton, Stoneking 2007)
calc_iES.R
A function to calculate the integrated EHHS statistic, iES, as described in Tang, Thornton and Stoneking (2007). You'd probably want to calculate the EHHS first!
calc_LD.R
Given a biallelic genotype matrix, calculates one or more measures of linkage disequilibrium between all locus-pairs. The available LD measures include: D, D', r2, X2 (chi-square), X2' (chi-square-prime).
[See example I for example data and usage.]
calc_snp_stats.R
A function to calculate basic SNP stats, including: allele frequency (p), MAF (minor allele frequency), MGF (minor genotype frequency), and tests for deviation from HWE (X2 test and Fisher's Exact test).
[See example I for example data and usage.]
gwas_lm.R
Performs single-locus (SNP) genome-wide association tests for one or more traits simultaneously under one or more of five inheritance models (additive, co-dominance, dominance, recessive, over-dominance) using linear regression.
calc_hwe_fisher.R
A script to test for deviation from HWE using Fisher's Exact test. This test is also incorporated into calc_snp_stats.R.
[See example I for example data and usage.]
calc_hwe_chisq.R
A script to test for deviation from HWE using Pearson's Chi-Squared test. This test is also incorporated into calc_snp_stats.R.
[See example I for example data and usage.]
calc_neiFis_multispop.R
A script to calculate inbreeding coefficients, Fis, for each sub-population using a given set of SNP markers.
[See example I for example data and usage.]
calc_neiFis_onepop.R
A script to calculate inbreeding coefficients, Fis, for a given population using a given set of SNP markers.
[See example I for example data and usage.]
calc_wcFstats.R
A script to estimate the variance components and fixation indices as described in Weir & Cockerham 1984 Evolution 38(6) : 1358-1370.
[See example I for example data and usage. ]
calc_wcFst_spop_pairs.R
A script to estimate Fst (theta) values for each pair of sub-populations using the method of Weir & Cockerham 1984 Evolution 38(6): 1358-1370.
[See example I for example data and usage.]
calc_allele_sharing.R
Calculates allele sharing distances between pairs of individuals (c.f. Gao & Stramer 2007 BMC Genetics 8:34).
[See example I for example data and usage.]

###Plotting Functions###

plclust_in_colour.R
A modification of (wrapper to) plclust for plotting hclust (hierarchical cluster) objects with coloured leaf labels.
plot_marker_lox.R
Generates a visual representation of the genetic positions of a set of markers.
plot_markers_by_set.R
A function to plot sets of markers on a map where the markers are coloured based on a defined variable.

###Example Data and Usage

#####Example I
exampleI.R
Download and read exampleI.R first. This script contains several very simple lines of codes for creating a geno and a subpop object, and their usages in the following scripts:

    calc_wcFstats(geno, subpop)
    calc_wcFst_spop_pairs(geno, subpop)
    calc_neiFis_onepop(geno)
    calc_snp_stats(geno)
    calc_neiFis_multispop(geno,subpop)
    calc_LD(geno)
    calc_allele_sharing(geno)  
    calc_hwe_chisq(geno)
    calc_hwe_fisher(geno)

exampleI_data.RData
A R workspace containing an instance of a geno and subpop objects used in exampleI.R; i.e. the actual datasets corresponding to the outputs in exampleI.R.
exampleI_functions.RData
A R workspace containing all functions used in exampleI.R.

#####Example II

geno <- simgeno()
alleleCount <- geno_to_allelecnt(geno)

exampleI.ASdist.nj
Figure: NJ tree from Example I