/DataAnalytics_PredictiveModels

Training and evaluating prediction models for the Covid-19 pandemic using Linear Regression, Logistic Regression & Random Forest. The data comes from the Centers for Disease Control and Prevention.

Primary LanguageJupyter Notebook

DataAnalytics_PredictiveModels

This repository focuses on training and evaluating prediction models for the Covid-19 pandemic. The data comes from the Centers for Disease Control and Prevention. CDC is a USA health protection agency and is in charge of collecting data about the COVID-19 pandemic, and in particular, tracking cases, deaths, and trends of COVID-19 in the United States. In this analysis, we focus on using the data collected by CDC to build a data analytics solution for death risk prediction.

The dataset we work with is a sample of the public data released by CDC, where the outcome for the target feature death_yn is known (i.e., either 'yes' or 'no'). The goal in this homework is to work with the data to build and evaluate prediction models that capture the relationship between the descriptive features and the target feature death_yn.

We carry out the following tasks:

  1. Exploring relationships between feature pairs and selecting/transforming promising features based on a given training set.
  2. Linear Regression.
  3. Logistic Regression.
  4. Random Forest.
  5. Improving Predictive Models.