/data-512-project_part1

Primary LanguageJupyter NotebookMIT LicenseMIT

Readme

Project Description

This repo contains the final part of the project for the course DATA 512 - Human Centered Data Science.

The goal of this analysis is to explore the

  • impact of masking mandates on the spread of Covid 19 in Oakland county, MI (Part 1)
  • impact of stay-at-home policies on the spread of Covid 19 in Oakland county, MI(Part 2)

We perform part 1 by a simple analysis by fitting a SIRS Model to the data to estimate the parameters related to infection spread. We perform part 2 by a regression analysis by comparing the infection doubling rate vs the changes in the baseline mobility.

License

We use the following data sources for this assignment.

The data related to Covid cases can be found here
It is licensed under Attribution 4.0 International (CC BY 4.0)

CDC dataset on masking mandates can be found here
Its licensing information can be found here

Google Community Mobility Reports can be found here
To use this, one must accept the Terms of Services mentioned by Google here - here

Folder Structure

├── data_clean
│   ├── cases.pq
│   ├── deaths.pq
│   └── mask_mandates.pq
├── data_raw
│   ├── mask-mandate-by-county.csv
│   ├── mask-use-by-county.csv
│   ├── RAW_us_confirmed_cases.csv
│   └── RAW_us_deaths.csv
├── notebooks
│   └── part1.ipynb
│   └── part2.ipynb
├── README.md
├── requirements.txt
├── src
│   ├── clean_data.py
│   ├── clean_data_part2.py
│   ├── main.py
│   └── model.py
└── visualizations

Input Files

There are four inputs used by the code in this repository.

The cases data is present in data_raw/RAW_us_confirmed_cases.csv
The deaths data is present in data_raw/RAW_us_deaths.csv
The community mobility report is present in data_raw/2020_US_Region_Mobility_Report.csv, data_raw/2021_US_Region_Mobility_Report.csv, data_raw/2022_US_Region_Mobility_Report.csv
Note1: Download this file from the link above and rename it mask-mandate-by-county.csv as it is too large to commit to Github
Note2: Download the community mobility reports from the link above since they are too large to commit to Github

Files Generated

The following data files are generated by the notebook.

  • data_clean/cases.pq
    this stores the cleaned data of daily cases in the US at a county level
  • data_clean/deaths.pq
    this stores the cleaned data of daily deaths in the US at a county level
  • data_clean/mask_compliance.pq
    this stores the cleaned data of mask compliance in the US at a county level
  • data_clean/mask_mandates.pq
    this stores the cleaned data of masking mandates in the US at a county level

Running the code

Clone this repo using

git clone git@github.com:abhishekiitm/data-512-project_part1.git
cd data-512-project_part1

First install the necessary Python libraries in a virtual environment by executing the following steps in the Terminal (assuming you are running Linux):

$ virtualenv proj_env  
$ source proj_env/bin/activate

Then install the libraries using

$ pip install -r requirements.txt

Download the raw files mentioned in the section Input Files if you don't already have them.
Run clean data script to generate the cleaned data from the raw data.

$ python src/clean_data_part2.py

Execute the notebook notebooks/visualize.ipynb using your choice of notebook environment (Jupyter Notebook or VS Code extension)