This is, for the time, a directory containing hacky scripts I have used for dealing with genomic data in R. They are in various states of completion. When frequetly used, they git pushed into developing R packages or pipelines.
It also contains some data sets I use for honey bee data. I will add to this occasionally.
"Hunt_markers.txt" contains a list of markers from Greg Hunt. These markers are from his QTL papers and the set here contains those. Some of these have been published and some have not. I found this list on Greg's desktop computer when I inherited the lab. I uploaded them to Dryad for their continued use.
"Solignac_x_Cornuet_2007_BeeLinkMap_Supp.xls" contains the OG honey bee recombination map.
Functions for dealing with VCF files in R
(outdated) Various useful functions in R
(outdated) Used for running Bustamante's SnIPRE to estimate the selection coefficient on genes in a give genome
script for running SNIPRE to estimate gamma on genes in a genome
Takes in a fasta file, loads it as a data frame
Manhattan plots for GWAS analysis
FDR correction using Storey's Q
(outdated) For Qanbari et al. (2012) creeping window analysis with Fst values. After removing windows that don't contain enough SNP sites,
To come is a general work flow for identifying significantly high Fst windows. using perm.creeper() function and then correcting for multiple tests using Storey's Q, found in Q_correct.r. High Fst windwos can then be collated with IRanges" from the BioconductoR suite
An example of this would be:
# Load Required Packages ------------------------
source("Q_correct.r")
source("VarFunct.r") #for permutation function
#run Permutation ---------------------------------
#"creeper" data frame is generated from creeper() function
head(creeper)
fst num_snps start end
2 0.1428318 3 3790 4296
3 0.1912891 2 3798 4296
5 0.1949204 3 5579 6429
num <- creeper$num_snps
lec <- perm.creeper(n=100000, num=num)
creeper$pnorm.fst <- pnorm(creeper$fstHvL, mean=mean(lec), sd=sd(lec),lower.tail=F)
creeper$q <- qvalue1(creeper$pnorm.fst)$q
This function takes output from Ancestry HMM for many individuals and plots the data as an average +/- SD ancestry of the focal lineage.
usage:
Rscript ancestryPlots.R