Survival-Convolution Model for COVID-19 Prediction
The coronavirus disease COVID-19 has created major health crisis around the world. It is imperative to predict the disease epidemic, investigate the impacts of containment and mitigation measures on infection rates, and compare between countries. Existing methods for infectious disease modeling are SEIR models that rely on many untestable prior assumptions (e.g., fitting past influenza data) and unreliable with wide prediction intervals.
We develop a robust survival-convolution model with few parameters that incorporates the date of unknown patient zero, latent incubation periods, and time-varying reproduction numbers.
- Title: Survival-Convolution Models for Predicting COVID-19 Cases and Assessing Effects of Mitigation Strategies
- Authors: Qinxia Wanga, Shanghong Xiea, Yuanjia Wanga, and Donglin Zengb
- Institutes:
-
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
-
- Department of Biostatistics, Gillings School of Public Health, University of North Carolina at Chapal Hill, Chapal Hill, NC, USA
-
- Correspondonce to: Dr. Yuanjia Wang (yw2016@cumc.columbia.edu) and Dr. Donglin Zeng (dzeng@email.unc.edu)
- Manuscript: Wang Q, Xie S, Wang Y, and Zeng D (2020). Survival-Convolution Models for Predicting COVID-19 Cases and Assessing Effects of Mitigation Strategies. Frontiers in Public Health 8(2020) 325.
- Slides
Our model is also used by CDC for COVID-19 ensemble forecast.
Real Time Prediction
Note: Once the testing capacity is increased, the trend will change again. These are beyond what our model can predict. Since April 14 2020, CDC case counts include both confirmed and probable cases following new CDC guidelines.
Data source for US: JHU CSSE group
US Daily New Cases (with training data up to Mar 27, 2021):
Observed and predicted daily new cases, with a 95% prediction interval.
US Daily Inc Deaths:
The cumulative deaths by Mar 8, 2021 will be over 600k.
Data source for Italy: Worldmeters
Italy Daily New Cases (with training data up to April 29)::
Observed and predicted daily new cases and 95% prediction interval (shaded). First dashed line indicates the nation-wide lockdown (Mar 11). Second and third dashed line indicates two or four weeks after. Training data: Feb 20 to Apr 29; Testing data: Apr 30 to August 7.
Setup Requirements
- Python under version 3.7
- Tensorflow under version 2.1.0
Example
Input data
Suppose that we have an numpy.array
train_data
indicating observed daily confirmed/diagnosed cases starting from the first observed case. We will save it as "train_data.npy".
>>> train_data
array([ 1., 20., 0., 0., 18., 4., 3., 0.,
3., 5., 7., 25., 24., 34., 63., 98.,
116., 106., 163., 290., 307., 329., 553., 587.,
843., 983., 1750., 2950., 4569., 5632., 4848., 9400.,
10311., 11166., 13451., 17388., 18743., 19452., 20065., 20732.,
24914., 26655., 30107., 32454., 34196., 25400., 31240., 33502.,
31997., 33606., 33752.], dtype=float32)
Command
Now we want to fit this training data into our model. We need to decide the following arguments.
knots
- length of piecewise segments for the infection rate. E.g., with 81 training data, we decide a knot after 23 days, soknots
will be23,28
.input_path
- directory of the inputtrain_data.npy
.output_path
- directory to save output files.max_epochs
- number of epochs used in tensorflow optimizer (default: Adam), usually from 1000 to 3000 depending on numbers of knots and data size.initial_a
- initial values for the weightsa
. The length islen(knots)+1
.- Other arguments can be changed including
learning_rate
(default: 0.01)test_duration
- number of days to predict (default: 100)min_t0
,max_t0
- range of t0 to iterate (default: 1 to 21)
Then we can use the following command to run the model in our example.
python run_single_machine_model.py --knots=23,28 --input_path="example/train_data.npy" --output_path="example/output" --max_epochs=2000 --initial_a=0.5,0.5,0.5 --learning_rate=0.01
The running time increases as number of knots, number of epochs and data size increases. The code can be adjusted for using multiple CPU cores to reduce the time, using the following command.
python run_multi_machine_model.py --knots=23,28 --input_path="example/train_data.npy" --output_path="example/output" --max_epochs=2000 --initial_a=0.5,0.5,0.5 --learning_rate=0.01
Output
After running the above command, we should have the following output files in output_path
.
best_t0.npy
- estimated t0best_weights.npy
- estimateda
. E.g., in our example, the estimates fora
contains three values. The infection rate is constant before the first observed case with value a1 and then linearly change to a2 during the first 23 days and from a2 to a3 in the next 28 days.predicted_infection_rate.npy
- infection rate from the first observed case.predicted_reproduction_number.npy
- reproduction number from the first observed case.predicted_daily_observed.npy
- daily new observed cases from the first observed case.predicted_daily_infected.npy
- cumulative latent cases on each day from the first observed case.