COVID Demographic Data Formatting

This project is designed to take the scraped raw data from obastani/covid19demographics and format it for processing/plotting/publication. We are currently only focusing on plotting COVID-19 infection/death rates over time by age.

Installation & Getting Started

Setting up local environment

If using conda:

conda env create -f environment.yml
conda activate covid

If using venv and pip:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

If using neither, make sure you have the packages listed in requirements.txt installed in your Python environment.

Optional Jupyter Lab Installation

If you are not using Jupyter Lab, don't worry about the jupyterlab dependency. If you are using Jupyter Lab, setup your environment as per above, then install Jupyter Lab using your package manager of choice:

conda install jupyterlab

or

pip install jupyterlab

and launch the project with:

jupyter lab --notebook-dir="."

State Data Formatters

For all states in our dataset, we need a formatter that reads the raw data from obastani/covid19demographics and extracts the age-group panel data. When a formatter has been written, add it to the code/formatters.py file and the state_functions dictionary therein. See the current file for formatters that have already been complete.

To get a sense of what a formatting function should do, refer to notebooks/SampleCode.ipynb. In brief, formatters take the raw json (e.g., raw_json["USA"]["AL"]) for a given state and return a timeseries of either cases/deaths, with any duplicate or missing dates removed.

Columns should be formatted as "{low}-{high}", for each age bucket reported. Because states have changed their reporting buckets over time, dates with a missing entry for that particular bucket should be filled with NaN. The index of the returned dataframe should be the date in datetime.date format (not datetime.datetime).

Example data below:

AL_formatter(raw_json["USA"]["AL"]).head()
0-4 18-24 25-49 5-17 50-64 65-100 unknown 5-24
date
2020-06-29 617.0 NaN 15123.0 NaN 7624.0 6510.0 272.0 6536.0
2020-06-30 639.0 NaN 15606.0 NaN 7817.0 6638.0 31.0 6805.0
2020-07-01 653.0 NaN 15984.0 NaN 7986.0 6787.0 29.0 7003.0
2020-07-02 677.0 NaN 16457.0 NaN 8208.0 6957.0 28.0 7277.0
2020-07-03 712.0 NaN 17178.0 NaN 8497.0 7196.0 65.0 7714.0