/Plotting_GeneticOutliers

Plotting multiple principal component analysis plots to detect genetic outliers in GenABEL data-set.

Primary LanguageR

Plotting multiple principal components (PC) from classical multidimensional scaling (cmd) output and mark outlier clusters based on PC1 vs. PC2

The function is mainly designed as integration to the GenABEL workflow

More specifically, it points to the detection of genetic outliers using identity by state (IBS) procedure (see quality control, section 5.2 of the tutorial).

Description

Although the function is embedded in the GenABEL workflow, it can be used for every cmd- or principal component matrix. It is important to have the sample names or ID names within the rownames of your matrix. The function plots different mds plots to verify genetic outliers obtained in a PC1 vs. PC2 plot. Further it returns a vector of IDs which represents the main sample-cluster (optional), the samples which should be kept in the analysis.

The input-arguments of the function are described within the code. Since this function is not part of a official package one should read the comments written in the first lines.

Installing

Just download/clone the repo as .zip while using the respective button at the main-page. After you unzipped the repo you can find the function within the folder outlier_function.

Once stored on your hard drive use:

source("path/to/where/the/function/is/stored")

The function should be loaded to the global environment of your current R-session.

Code comments

Within the function I extensively describe what the specific code blocks do. If you use R-Studio and you linked the .R-extension (during the installation) to the program, simply double-click the file and it will be opened in R-Studio. Then you can read the source-code and the comments.