/DataMining

Primary LanguageJupyter Notebook

DataMining

- Exploratory Data Analysis and Data Mining for COVID-19 pandemic using Pyspark

- Short Description & Introduction

The analysis was conducted for the purposes of the course in Data Mining during my postgraduate program in Data Science & Machine Learning. The dataset that we use for our analysis originates from the non-profit organization Our World in Data. The dataset was pulled from the following GitHub link. In this analysis we are mainly interested in extracting valuable information about the spreading and the effects of the COVID-19 pandemic for the period 05/01/2021 - 30/06/2021. We describe and analyze how the pandemic affected and spread among different countries, continents and in the whole World in general.

  • We produce visualizations to answer questions and enhance the findings of our analysis for the aforementioned period. For example, we would like to know How the number of the new COVID cases evolved in Europe?

Or Which countries have had the most total Covid cases until the end of June 2021?

Coutries with highest total Covid cases until 30/06/2021

Country Total Covid cases
United States 33.777.444.
India 30.411.634
Brazil 18.570.296
France 5817787
Russia 5449594

Despite the fact the uploaded notebook is writting in Greek an English version will also be available soon! :)