Random Forest Outperforms Other Phenotype Prediction Algorithms

EPI 511: Advanced Population and Medical Genetics
Prof. Alkes Price, Spring 2019
Harvard T. H. Chan School of Public Health

The objective of this project was to explore implementing a random forest model in the context of predicting phenotype from genotype, and to compare the performance of the random forest model to other models (kNN and standard polygenic risk scoring). I perform a simple simulation study using common SNPs from HapMap under two distributions of per-allele effect size, also looking at how models perform across varying population structures.

This repository contains my final project, as well as code for analyses and data used for simulations.