This repository collects the jupyter notebooks referred to the implementation of the models treated in the final master's work "Analyzing the influence of physical and genetic traits in cross training performance with machine learning".
Author: Ana González Guerra, student of the master in Data Science at the University of Cantabria
Tutor: Cristina Tirnauca
Cotutor: Adrián Odriozola Martínez
Presentation date: 26th June 2020
The aim of this work is to study the influence that genetics and physical condition have or may have in sports, specifically in cross training, with target variables related to power and percentage of fat-free mass. We work with data generated in Wingate tests, with a duration of 30 seconds; these are physical exercises at the maximum intensity possible in an ergometer. During the performance of this test, physical predictors that give an idea of the individual's anaerobic capacity are measured. Two tests of this type were carried out; with a 5 minute interval during which both the above-mentioned physical predictors and the target variables of interest were continuously monitored. Prior to the test, the genetic variants (SNPs) presented by the individuals were analyzed for a set of 209 gene,; along with physical predictors of anthropometric character. To achieve this objective, different machine learning models were studied: Random Forest, Bayesian Networks, traditional and optimized Multiple Linear Regression (with Best Subset Selection, Principal Components Analysis and Partial Least Squares) and K-Nearest Neighbors. It was found that the truly interesting relationships appear when both genetic and physical predictors are combined.