Visualization Website: https://tonyzhanghm.github.io/adgwas_website/
In this study, we conducted a Genome-Wise Analysis Study on Late-Onset Alzheimer's Disease (LOAD). For experiment details, please refer to the paper.
Docker images: https://hub.docker.com/repository/docker/tonyzhanghm/genetics
Clone the repo: git clone https://github.com/TonyZhanghm/DSC180B_Genome_01.git
To run the whole experiment: python run.py test-project
To run the project step by step: python run.py
with following flags:
get_data
: download the raw data and the tools needed.
filter
: filter the dataset with PLINK 1.9. The specific parameter choices could be found in the paper.
pca
: run principal component analysis with PLINK 1.9.
plot_pca
: plot pariplots for the first 5 principal components with seaborn.
plot_eigenval
: plot the scree plot.
logistic
: run the association test with logistic regression.
manhattan
: plot the manhattan plot with bioinfokit.
regional
: plot regional plots for the nine genes of interests.
qqplot
: plot a qqplot on the test results.
meta
: run metal analysis with METAL.
The data will be stored in data/
and the experiment results will be store in data/output/
.
- Request data from source: UK Biobank and NIAGADS
- Understand the analysis methods: meta analysis, Manhattan plot, regional association plot.
- Write a survey of the data you are using, the relationship and appropriateness of the data to the problem under examination, and the context in which the data was created.
- Summarize relevant details of the data generating process, describing the population that the data represents, whether that population is relevant to the question at hand, while addressing possible questions of data reliability.
- Understand how to use population stratification on our data so that it can apply to other races besides European descent.
- no new code added
- Describe the source of the backup dataset, the population that the data represents, whether that population is relevant to the question at hand, while addressing possible questions of data reliability. (Scott)
- Perform preprocessing quality controls using Plink commands (Jared, Tony)
- Statistically assess the quality of the data (Tony)
- EDA (Barplot, PCA, Scatter matrix plot, Scree Plot) (All)
- Perform multi-covariate association analysis with logistic regression (Tony)