Pipeline for running smartpca, ADMIXTURE and ALDER
srun --pty -A ic_ccb293 -p savio -t 01:00:00 bash -i
module load gcc/6.3.0
module load gsl
module load openblas
module load fftw
module load r/3.5.1
/global/home/users/sandrahui/ccb293/ # data
/global/home/users/sandrahui/ccb293/bin # executables
# copy folder to your home directory
cd ~/
cp -rp /global/home/users/sandrahui/ccb293 .
Population Structure and Eigenanalysis. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0020190 https://github.com/chrchang/eigensoft/blob/master/POPGEN/README
cd ~/ccb293/classEx
./bin/smartpca -p classEx.smartpca.par
This program requires data in EIGENSTRAT format (See https://reich.hms.harvard.edu/software/InputFileFormats).
The program also requires a parameter file. See format below.
genotypename: classEx.geno # file with genotype information
snpname: classEx.snp # file with snp information
indivname: classEx.ind # file with individual information
poplistname: classEx.pop.list # list of pops to include in the run.
evecoutname: classEx.evec # output file of eigenvectors.
evaloutname: classEx.eval # output file of all eigenvalues
phylipname: classEx.phyl # file with Fst values across populations
numoutevec: 10 # number of PCs to output
numthreads: 1 # if running interactively, use 1 only
numoutliter: 0 # number of outlier inds to remove.
Plot the output in R using:
# start R
R [enter]
# read output files
evecs <- read.table("classEx.evec")
colnames(evecs) <- c("sample", paste("PC", 1:10, sep=""), "pop")
# filter down to only 4 populations
evecs <- evecs[evecs$pop %in% c("YRI", "CEU", "CHB", "ASW"),]
evecs$pop <- factor(evecs$pop)
# plot
# png("classEx.PCA.png") # uncomment to save as png
# pdf("classEx.PCA.pdf") # uncomment to save as pdf
plot(x=evecs$PC1, y=evecs$PC2, col=factor(evecs$pop), xlab="PC1", ylab="PC2", pch=16)
legend("bottomright", legend=levels(evecs$pop), col=1:length(levels(evecs$pop)),pch=16)
# dev.off() # uncomment to save image
# quit R
q() [enter]
Fast model-based estimation of ancestry in unrelated individuals.https://genome.cshlp.org/content/19/9/1655.full http://www.genetics.ucla.edu/software/admixture/admixture-manual.pdf
# Set K=1..6
./bin/admixture --cv classEx.bed $K
This program requires input in plink format (http://zzz.bwh.harvard.edu/plink/data.shtml). Note, K = number of clusters. Run the command for each value of K of interest (for example, 1..6)
To read in the ADMIXTURE results:
# start R
R [enter]
# read output files
admix_k1 <- read.table("classEx.1.Q")
admix_k2 <- read.table("classEx.2.Q")
admix_k3 <- read.table("classEx.3.Q")
admix_k4 <- read.table("classEx.4.Q")
admix_k5 <- read.table("classEx.5.Q")
admix_k6 <- read.table("classEx.6.Q")
# plot
# pdf("classEx.admixPlot.pdf") # uncomment to save as pdf
# png("classEx.admixPlot.png") # uncomment to save as png
par(mfrow=c(8,1), mar=c(1,4,1,1))
par(mfrow=c(6,1), mar=c(1,4,1,1))
barplot(t(as.matrix(admix_k1)), col=rainbow(1), border=NA)
barplot(t(as.matrix(admix_k2)), col=rainbow(2), border=NA)
barplot(t(as.matrix(admix_k3)), col=rainbow(3), border=NA)
barplot(t(as.matrix(admix_k4)), col=rainbow(4), border=NA)
barplot(t(as.matrix(admix_k5)), col=rainbow(5), border=NA)
par(mar=c(3,4,1,1))
x <- barplot(t(as.matrix(admix_k6)), col=rainbow(6), border=NA)
# add pop labels
inds <- c("ASW", rep("", 61), "CEU", rep("", 99), "CHB", rep("", 103), "TSI", rep("", 107), "YRI", rep("", 108))
mtext(inds, 1, at=x, las=1, adj=0)
# dev.off() # uncomment to save image
# Quit R
q() [enter]
Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium. http://www.genetics.org/content/193/4/1233.full https://github.com/priyamoorjani/CCB293/blob/master/ALDER.txt
./bin/alder -p classEx.alder.par &> alder.log
This program requires data in EIGENSTRAT format (See https://reich.hms.harvard.edu/software/InputFileFormats). The input geno, snp, and ind files must be consistent.
The program also requires a parameter file. See format below.
genotypename: classEx.geno # file with genotype information
snpname: classEx.snp # file with snp information
indivname: classEx.ind # file with individual information
admixpop: ASW # name of admixed population
refpops: YRI;CEU # name of reference (ancestral) population (seperated by semicolon).
checkmap: NO # check genetic map included in snp file.
Included in the log file.