COVID-19 Vaccination Rollout Project
This is a data analysis project, written for my CSE 163 Intermediate Programming class at UW in Python with the help of the Pandas library. The goal of this project is to analyze the speed of the COVID-19 vaccine rollout across different jurisdictions, and see how it has impacted the virus's spread. There are many variables and players involved in this process, including the effectiveness and reach of distribution networks in different countries and states, as well as who is getting the vaccines and when.
Due to the life threatening nature of inefficient distribution, stakeholders and the public have a right to clearly understand how quickly the COVID-19 vaccine rollout is progressing. By computing per capita case and vaccination rates across different countries and jurisdictions, and breaking these statistics down by age group and ethnicity in the United States, complex COVID-19 data can be visualized. This will reveal bottlenecks in the process, allowing the application of adequate resources to fixing them.
Running the Code
Note: this project has only been tested on Windows 10.
To run the code, install Python 3.9
, as well as the latest pip
version for Python 3. Make sure both are in your environment PATH
.
The following command will install all required packages:
pip install numpy pandas geopandas matplotlib requests zipfile shutil
A note about geopandas
on Windows: geopandas
requires fiona
, which depends on gdal
. They can be found here:
https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal
https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona
Install them with:
pip install path/to/gdal.whl
pip install path/to/fiona.whl
When you have finished setting up the environment, run main.py
to initiate the project. The datasets and visualizations will be output to the datasets
and visualizations
folders, respectively.
Research Questions
In order to accomplish the goal described at the top of this document, I created four research questions:
-
What is the average number of Covid-19 vaccinations per capita per day by country? Vaccination development, certification, and distribution infrastructure varies by country. I want to figure out which countries have the most effective processes for reaching herd immunity.
-
How does the average number of Covid-19 vaccinations per capita per day relate to the number of new Covid-19 cases by country? I want to find how the speed of the vaccination rollout by country affects the spread of Covid-19, and whether it contributes to slowing down the virus.
-
What is the average number of Covid-19 vaccinations per capita per day by state, as well as age and ethnicity, in the United States? Specifically for the United States, I want to examine how well different states are implementing the Covid-19 vaccination rollout by age and ethnicity.
-
How does the average number of Covid-19 vaccinations per capita per day relate to the number of new Covid-19 cases by state, as well as age and ethnicity, in the United States? I want to find how the speed of the vaccination rollout by state affects the spread of Covid-19, and whether it contributes to slowing down the virus for different age groups and ethnicities.
Project Structure
This project was structured in such a way as to allow data to be piped from download, to processing, to analysis, and finally to visualization. The file structure is:
main.py
- Main project file, runs the entire data pipeline
data_manager.py
- Download and update all datasets
data_processing.py
- Process and consolidate all datasets for analysis. This includes case, vaccination, death, geographic, and population data
analysis.py
- Calculate per capita case, vaccination, and death data for a variety of jurisdictions, and across time
visualization.py
- Plot line graphs, multiple line graphs, and maps of the analyzed data
metadata
- A folder containing metadata on all project datasets, and references mapping jurisdiction names to various code formats
sample_visualizations
- A folder containing some sample results of running this data analysis project
Datasets
Note: The raw dataset URLs can be found in metadata\datasets.csv