This repository involved training and evaluating prediction models on a COVID-19 dataset provided by the Centers for Disease Control and Prevention (CDC) that was cleaned prior to prediction.
The dataset provided has de-identified patient data including COVID severity indicators, outcomes, clinical data and laboratory test results.
The aim of this project was to predict the target outcome of death based on various features in the dataset.
A selection of models were used such as Linear Regression, Logistic Regression, Decision Tree and Random Forest models. Relationships between the target feature and descriptive features were explored and analysed and various methods were employed to improve the predictive models.