/Covid-19-Prediction

Designing different classifiers to predict the outcome when a new person is admitted to the hospital due to covid-19, as well as comparing performances of the classifiers and finding optimal hyperparameters for each of them.

Primary LanguageJupyter Notebook

Covid 19 Prediction for Machine Learning Project

Description of the Project

The data used in this project will help to identify whether a person is going to recover from coronavirus symptoms or not based on some pre-defined standard symptoms. These symptoms are based on guidelines given by the World Health Organization (WHO).

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020. Data is in “data.csv”. The data is cleaned and preprocessed.

The dataset contains 14 major variables that will be having an impact on whether someone has recovered or not, the description of each variable are as follows:

  • Country: where the person resides
  • Location: which part in the Country
  • Age: Classification of the age group for each person, based on WHO Age Group Standard
  • Gender: Male or Female
  • Visited_Wuhan: whether the person has visited Wuhan, China or not
  • From_Wuhan: whether the person is from Wuhan, China or not
  • Symptoms: there are six families of symptoms that are coded in six fields.
  • Diff_Sym_Hos
  • Result: death (1) or recovered (0)

    Project Task

    The Project Goal was to design different classifiers to the predict the outcome (death/recovered) when a new person is admitted to the hospital.

    I divided the data into three partitions: training, validation, and testing then I designed the following classifiers:

  • K-Nearest Neighbors
  • Logistic Regression
  • Naïve Bayes
  • Decision Trees
  • Support Vector Machines

    For each classifier, I got the optimal hyperparameters as well as comparing the performance of all classifiers using different metrics such as the precision, recall, F1-score, and ROC/AUC curves.