/Corona-Dataset-Analysis-Using-PySpark

This module performs statistical analysis on the noval corona virus dataset. The dataset being used was last updated on May 02, 2020. The Module performs the following Functions: * Displays the statistics of input dataset * Reads data from csv files and stores the aggregated output in parquet format * Counts the Number of records for each country/region and provice/state * Lists max Cases for each country/region and provice/state * Lists max Deaths for each country/region and provice/state * List max Recoveries for each country/region and provice/state *

Primary LanguagePython

No issues in this repository yet.