A Non-Mechanistic Approach to Real-Time Forecasting of US COVID-19 Mortality Using Machine Learning and Novel Digital Data
This model uses a gradient boosted regressor with hyperparameter optimization that uses prior COVID cases and deaths as well as demographic, socioeconomic, mobility, and healthcare related county-level covariates. This model forecasts COVID deaths at the county level and aggregates to the state and national level. This model does not directly account for state reopenings and closures but indirectly via changes in the mobility measures. Evaluation of model predictions is done using COVID-19 Forecast Hub Evaluation modules.
Our forecasts are featured on the CDC Website
Source | Description | Scope |
---|---|---|
JHU CSSE | Daily US COVID-19 Deaths and Cases | County Level |
County-level Socioeconomic Dataset | 300 variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics | County Level |
PlaceIQ | Exposure indices derived from PlaceIQ movement data | County Level |
US Dept of HHS | COVID-19 Hospital capacity and utilization from HHS | State Level |
The forecasting model was implemented using Python 3.6 in Jupyter notebooks. Python packages used include NumPy, Pandas, scikit-learn, and XGBoost.