This code estimates true new infections of COVID-19 over time in each state of the US based on a combination of wastewater data and seroprevalence data.
The code obtains the COVID-19 wastewater concentration data, smooths it and scales it in a way that during the early part of the pandemic the time-series matches with that obtained by processing seroprevalence data. The output is true_new_infec_ww
, which holds the time series data of estimated new COVID-19 infections for each state over days since January 23, 2020.
The Python code uses the following libraries:
csaps==1.1.0
numpy==1.23.3
pandas==1.5.0
requests==2.28.1
scipy==1.10.0
wlag
: The expected lag between the reported cases time-series and the wastewater time-series
eq_start
: The start date for matching against seroprevalence data
eq_end
: The end date for matching against seroprevalence data
smooth_factor
: Smoothing window in number of days
- Biobot.io COVID-19 Wastewater Concentration
- 2020-2021 Nationwide Blood Donor Seroprevalence Survey Infection-Induced Seroprevalence Estimates
us_states_population_data.txt
: List of populations by stateus_states_abbr_list.txt
: List of state abbreviationsfips_table.txt
: FIPS information on US counties and states
python Prevalence_ww.py
executes everything
CDC_Sero.py
: Loads and processes seroprevalence data
latest_us_data.py
: Loads recent time-series for COVID-19 reported cases in the states of the US
smooth_epidata.py
: Preprocessing function to smooth and remove outliers from time-series
All the secondary files are called in the main prevalance_ww.py
and necessary files are stored as Pickle objects in Output_Pickles Folder.
1. true_new_infec_ww.pkl
2. true_new_infec_final.pkl
3. un_array.pkl