/COVID-19_Tracker-_European_Infection_Analysis

Tracking Daily COVID-19 Infection Count in European Countries using Hadoop MapReduce and Python

Primary LanguagePython

COVID-19_Tracker-_European_Infection_Analysis

Tracking Daily COVID-19 Infection Count in European Countries using Hadoop MapReduce and Python

Table of Contents
  1. Description
  2. Steps
  3. How to run
  4. Results
  5. Libraries used
  6. Source

Description

The project aims to harness the power of Hadoop MapReduce and Python to track and analyze the daily COVID-19 infection count in European countries. By utilizing the distributed processing capabilities of Hadoop, we can efficiently handle large volumes of data and extract meaningful insights.

Steps

  • Collecting reliable COVID-19 data for European countries (WHO here).
  • Preparing and formatting data for Hadoop MapReduce.
  • Setting up Hadoop(standalone mode here) for distributed processing.
  • Implementing MapReduce job in Python.
  • Analyzing daily COVID-19 infections in European countries.

How to run

cat data_set.csv | python3 mapper.py | python3 reducer.py

The above command processes data from the "data_set.csv" file using a mapper script and a reducer script. It performs a series of sequential operations to transform the data and generate the final output.

Results

Here are some data visuals of the output obtained:

Mapper

The mapper script reads data from a CSV file, extracts relevant information, and performs grouping based on location. It calculates the difference in total COVID-19 cases between the first and last recorded dates for each location.
Analyzing Mapper 3D plot

Reducer

The reducer script processes the output from the mapper script and calculates the average daily increase in cases for each location. The results are then displayed and/or saved in JSON format.

Intermediate Line Graph

reducer

Sorted Bar Graph

reducer4

Libraries used

  • json
  • datetime
  • sys
  • itertools
  • matplotlib
  • numpy
  • plotly

Source

(back to top)