This project is designed to take the scraped raw data from obastani/covid19demographics and format it for processing/plotting/publication. We are currently only focusing on plotting COVID-19 infection/death rates over time by age.
If using conda
:
conda env create -f environment.yml
conda activate covid
If using venv
and pip
:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
If using neither, make sure you have the packages listed in requirements.txt
installed in your Python environment.
If you are not using Jupyter Lab, don't worry about the jupyterlab
dependency. If you are using Jupyter Lab, setup your environment as per above, then install Jupyter Lab using your package manager of choice:
conda install jupyterlab
or
pip install jupyterlab
and launch the project with:
jupyter lab --notebook-dir="."
For all states in our dataset, we need a formatter that reads the raw data from obastani/covid19demographics and extracts the age-group panel data. When a formatter has been written, add it to the code/formatters.py
file and the state_functions
dictionary therein. See the current file for formatters that have already been complete.
To get a sense of what a formatting function should do, refer to notebooks/SampleCode.ipynb
. In brief, formatters take the raw json (e.g., raw_json["USA"]["AL"]
) for a given state and
return a timeseries of either cases/deaths, with any duplicate or missing dates removed.
Columns should be formatted as "{low}-{high}", for each age bucket reported. Because states have changed their reporting buckets over time, dates with a missing entry for that particular bucket should be filled with NaN
. The index of the returned dataframe should be the date in datetime.date
format (not datetime.datetime
).
Example data below:
AL_formatter(raw_json["USA"]["AL"]).head()
0-4 | 18-24 | 25-49 | 5-17 | 50-64 | 65-100 | unknown | 5-24 | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
2020-06-29 | 617.0 | NaN | 15123.0 | NaN | 7624.0 | 6510.0 | 272.0 | 6536.0 |
2020-06-30 | 639.0 | NaN | 15606.0 | NaN | 7817.0 | 6638.0 | 31.0 | 6805.0 |
2020-07-01 | 653.0 | NaN | 15984.0 | NaN | 7986.0 | 6787.0 | 29.0 | 7003.0 |
2020-07-02 | 677.0 | NaN | 16457.0 | NaN | 8208.0 | 6957.0 | 28.0 | 7277.0 |
2020-07-03 | 712.0 | NaN | 17178.0 | NaN | 8497.0 | 7196.0 | 65.0 | 7714.0 |