CS6220_Project_Covid-19

TOPIC: DATA MINING TECHNIQUES APPLIED ON CORONAVIRUS DATASET

TEAMMATES: Ankita Mahapatra Nesara Madhav Siddhesh Latkar

ABSTRACT

In late December 2019, a cluster of unexplained pneumonia cases was reported in Wuhan, China. A few days later, the causative agent of this mysterious pneumonia was identified as a novel coronavirus. This virus has been temporarily named as severe acute respiratory syndrome coronavirus 2 and the relevant infected disease has been named as coronavirus disease 2019 (COVID-19) by the World Health Organization, respectively. The COVID-19 epidemic spread in China and all over the world now. The virus is highly contagious, something that can be noticed by the quick spread of virus across the globe. The health sector agencies need a possible number of confirmed COVID-19 cases to make quick decisions and be prepared for eventualities. Hospital systems need to be prepared for the high number of infected people coming in for treatment. Hospital systems of many countries are not able to manage the sudden inflow of the people. This has provoked authorities to enforce the lockdown in many countries. This has disrupted normal life and adversely affected the world economy. Considering the situation that has arisen from this contagious virus, they had to prepare in advance and for every eventuality for the virus. Different case studies had shown that if arrangements and precautions are not taken at proper time, the casualty numbers would be very high as one infected person would be spreading virus to a large number of people. Right prediction of confirmed COVID-19 cases will definitely help in protecting the lives and lessen the impact on the economy. The purpose of this review is primarily to review the features of COVID-19 and comment on the predicted rates of the spread of the virus. Regression algorithms like Gradient Boost and Random Forest are used for the prediction. We used these algorithms to predict the COVID-19 confirmed cases. We performed different experiments on the dataset using these algorithms and compared the inferences. At the end, we saw that gradient boost performed better than random forest. COVID-19 confirmed cases predicted by gradient boost were closer to actual confirmed cases seen.

Techniques used:

  • SVM (Support Vector Machine)
  • Random forest
  • Gradient boost

Report: https://docs.google.com/document/d/1I_O3tIfhW90cIOHv1c9sjPlA2I5U-Vh5sKl0XlaCmI4/edit#