Random useful functions for Moose's work

Authors: David Hughes

Date started: 31st Jan. 2020


This repo contains a variety of helpful functions that I come back to time and time again. Rather than adding them into a variety of different packages I will try to maintain this catch-all that I and my colleagues can source when necessary.

Installation instructions of the moosefun package

1. insure that the devtools library is installed on your local machine
	 > ifelse("devtools" %in% rownames(installed.packages()), 
2. install moosefun
	> devtools::install_github("hughesevoanth/moosefun")

iPVs functions have moved to their own package

Date moved: March 6th 2020
For functions to aid in the identification of principal variables in inter-correlated data sets please follow the path below.


coverting rsid to SNPID and SNPID to rsid

the useful functions are: rsid_2_chrbp() & chrbp_2_rsid()

  • Do note that this function is only good for a handful (1 to a few hundred) of SNPs that need an id conversion.
  • The reason for this is because for each ID to convert a query is sent over the web and a built in sleep step needed to be implemented to avoid query crashes, which can still occur.
  • If you can I would avoid trying to run these functions for more than ~100 SNPs at a time.
  • If you need to convert thousands of IDs look into using ANNOVAR (https://annovar.openbioinformatics.org/en/latest/) or some other tool.


## install biomaRt
if (!requireNamespace("BiocManager", quietly = TRUE))

## 	load libraries

## convert rsids to snpids
## read in the SNP list
snps = c("rs4987657","rs4987667","rs4987682")
## Extract mapping cooridinates
map = rsid_2_chrbp(snps)
o = order(map$chr, map$bp)
map = map[o,]

## convert snpids to rsids
## read in the SNP list
snps = c("7:142569596:A_G","7:142572908:T_C","7:142574913:A_G")

## NOTE: the function splits snpids on ":" and only uses the first two 
## strings as chr and bp, all else is ignored

## Extract mapping cooridinates
map2 = chrbp_2_rsid(snps)
o = order(map2$chr, map2$bp)
map2 = map2[o,]




  • taken from the GenABEL package to perform z-transformations. This function is necessary to run my edited version of the rank normal transformation function.


  • taken from the GenABEL package to perform rank normal transformations, but edited to randomly split tied values.


  • a function to plot a biplot or a PCA with a loadings plot on top of it.

  • however, the uniqueness here is that we are not plotting the loading from variables used in the construction of the PCA.

  • rather, we are passing a novel set of variables | traits | phenotypes that will be correlated to the PC-axis (1 and 2) and then plotted.

  • as an example:

    1. generate a prcomp object:

       pca = prcomp( iris[, 1:4] )
    2. or you can generate a probabilistic pca (pcaRes) object

       pca = ppca( as.matrix( iris[, 1:4] ), nPCs = 4)
    3. run the function

       moose_biplot(PCA = pca, dataframe_of_phenotypes = iris[, 1:4], 
        plot_top_N_phenotypes = 3, 
        grouping1 = iris$Species, grouping1NAME = "species",
        grouping2 = iris$Species, grouping2NAME =  "species",
        scalearrows = FALSE )
    • the dataframe_of_phenotypes can be any matrix of quantitative trait with the same number of row as passed to the prcomp() or ppca() functions.
    • grouping1 dictates the color scheme and the ellipses to be drawn Currently limited to 9 groups
    • grouping2 dictates the plot shapes to be used. Currently limited to 5 groups
    • scalearrows allows you to scale the largest correlated trait to a rho of 1, and all other arrows in correspoinding manner. This can be done to aid in visualization. Note: if scalearrows is set to TRUE, the relative length of the arrows remain informative.