CIS520 FINAL PROJECT!
Team-members
- Dinesh Jagai dinesh97@seas.upenn.edu
- Pranav Panganamamula ppranav@seas.upenn.edu
- Julian P. Schnitzler schnitzl@seas.upenn.edu
Project Title
Predicting the spread of dengue virus in San Juan and Iquitos over a five year period
Goal
To predict the number of dengue cases each week (in each location) based on environmental variables describing changes in temperature, precipitation, vegetation, and more.
Objectives
- Motivation (adapted from DengAI)
- Dengue fever is a mosquito-borne disease that occurs in tropical and sub-tropical parts of the world. In mild cases, symptoms are similar to the flu: fever, rash, and muscle and joint pain. In severe cases, dengue fever can cause severe bleeding, low blood pressure, and even death.
- Because it is carried by mosquitoes, the transmission dynamics of dengue are related to climate variables such as temperature and precipitation. Although the relationship to climate is complex, a growing number of scientists argue that climate change is likely to produce distributional shifts that will have significant public health implications worldwide.
- In recent years dengue fever has been spreading. Historically, the disease has been most prevalent in Southeast Asia and the Pacific islands. These days many of the nearly half billion cases per year are occurring in Latin America.
- Our goal is to predict the number of dengue cases each week (in each location) based on environmental variables describing changes in temperature, precipitation, vegetation, and more.
- DATASET
- Sets of features(N = 1456, p = 22)
- City and date indicators
city – City abbreviations: sj for San Juan and iq for Iquitos
week_start_date – Date given in yyyy-mm-dd format
- NOAA's GHCN daily climate data weather station measurements
station_max_temp_c – Maximum temperature
station_min_temp_c – Minimum temperature
station_avg_temp_c – Average temperature
station_precip_mm – Total precipitation
station_diur_temp_rng_c – Diurnal temperature range
- PERSIANN satellite precipitation measurements (0.25x0.25 degree scale)
precipitation_amt_mm – Total precipitation
- NOAA's NCEP Climate Forecast System Reanalysis measurement (0.5x0.5 degree scale)
reanalysis_sat_precip_amt_mm – Total precipitation
reanalysis_dew_point_temp_k – Mean dew point temperature
reanalysis_air_temp_k – Mean air temperature
reanalysis_relative_humidity_percent – Mean relative humidity
reanalysis_specific_humidity_g_per_kg – Mean specific humidity
reanalysis_precip_amt_kg_per_m2 – Total precipitation
reanalysis_max_air_temp_k – Maximum air temperature
reanalysis_min_air_temp_k – Minimum air temperature
reanalysis_avg_temp_k – Average air temperature
reanalysis_tdtr_k – Diurnal temperature range
- Satellite vegetation - Normalized difference vegetation index (NDVI) - NOAA's CDR Normalized Difference Vegetation Index (0.5x0.5 degree scale) measurements
ndvi_se – Pixel southeast of city centroid
ndvi_sw – Pixel southwest of city centroid
ndvi_ne – Pixel northeast of city centroid
ndvi_nw – Pixel northwest of city centroid
- Sets of features(N = 1456, p = 22)
-
Related Work
-
- https://ieeexplore.ieee.org/abstract/document/7912315 (This papers discusses disease Prediction by Machine Learning Over Big Data From Healthcare Communities using a CNN-based multimodal disease risk prediction )
- https://ieeexplore.ieee.org/abstract/document/7912315 (This papers discusses disease Prediction by Machine Learning Over Big Data From Healthcare Communities using a CNN-based multimodal disease risk prediction )
-
- https://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-019-3874-x (This paper discusses how to predict dengue outbreaks based on disease surveillance, meteorological and socio-economic data - it uses a Quasi-Poisson regression in which the variance of count data (dengue counts) is assumed to be a linear function of the mean for to predict the dengue cases)
- https://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-019-3874-x (This paper discusses how to predict dengue outbreaks based on disease surveillance, meteorological and socio-economic data - it uses a Quasi-Poisson regression in which the variance of count data (dengue counts) is assumed to be a linear function of the mean for to predict the dengue cases)
-
- https://towardsdatascience.com/dengue-fever-and-how-to-predict-it-a32eab1dbb18 (An article highlighting how to predict dengue using different regression methods (fairly similar to what we're trying to do, but ours has more parameters and data. Also we plan to use deep learning in addition to regression)
- https://towardsdatascience.com/dengue-fever-and-how-to-predict-it-a32eab1dbb18 (An article highlighting how to predict dengue using different regression methods (fairly similar to what we're trying to do, but ours has more parameters and data. Also we plan to use deep learning in addition to regression)
-
- https://pdfs.semanticscholar.org/1c31/0e22fe1a0418aaa25fcc629736b5a46655f8.pdf (This paper looks at developing a Dengue Possibility Forecasting Model using Machine Learning Algorithms. Specifically, they use a Gradient Boosting Regression ensemble method to predict the possibility of a dengue outbreak taking place)
- https://pdfs.semanticscholar.org/1c31/0e22fe1a0418aaa25fcc629736b5a46655f8.pdf (This paper looks at developing a Dengue Possibility Forecasting Model using Machine Learning Algorithms. Specifically, they use a Gradient Boosting Regression ensemble method to predict the possibility of a dengue outbreak taking place)
-
- https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0005973 (This paper examines predicting the number of dengue cases in China using several ML techniques including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence) They found that the (support vector regression) SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study.
- https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0005973 (This paper examines predicting the number of dengue cases in China using several ML techniques including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence) They found that the (support vector regression) SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study.
-
- http://www.imedpub.com/conference-abstracts-files/deep-learning-applications-for-predicting-dengue-fever.pdf (This paper looks at using deep learnging to predict the number of dengue cases in Taiwan.
- http://www.imedpub.com/conference-abstracts-files/deep-learning-applications-for-predicting-dengue-fever.pdf (This paper looks at using deep learnging to predict the number of dengue cases in Taiwan.
-
- https://www.researchgate.net/profile/Yuhanis_Yusof/publication/272912479_Dengue_Outbreak_Prediction_A_Least_Squares_Support_Vector_Machines_Approach/links/5691be4f08ae0f920dcb9058/Dengue-Outbreak-Prediction-A-Least-Squares-Support-Vector-Machines-Approach.pdf (This paper develops a dengue prediction model using Long Short Term Memory neural networks)
- https://www.researchgate.net/profile/Yuhanis_Yusof/publication/272912479_Dengue_Outbreak_Prediction_A_Least_Squares_Support_Vector_Machines_Approach/links/5691be4f08ae0f920dcb9058/Dengue-Outbreak-Prediction-A-Least-Squares-Support-Vector-Machines-Approach.pdf (This paper develops a dengue prediction model using Long Short Term Memory neural networks)
-
- https://www.biorxiv.org/content/biorxiv/early/2019/09/06/760702.full.pdf (This paper uses Least Squares Support Vector Machines (LS-SVM) in predicting future dengue outbreaks in Malaysia)
- https://www.biorxiv.org/content/biorxiv/early/2019/09/06/760702.full.pdf (This paper uses Least Squares Support Vector Machines (LS-SVM) in predicting future dengue outbreaks in Malaysia)
-
-
Problem Formulation
- Predit the number of degue cases each week in San Juan and Iquitos over a five year period (2008 - 2013) using
given environmental variables describing changes in temperature, precipitation, vegetation, and more from 1990.
- Predit the number of degue cases each week in San Juan and Iquitos over a five year period (2008 - 2013) using
given environmental variables describing changes in temperature, precipitation, vegetation, and more from 1990.
-
Methods
- Used an imputation method to clean the data
- Use simple regression methods to pedict the number of cases based on our 22 features
- Use deep learning with neural networks for a more robust approach.
- Used an imputation method to clean the data
-
Evaluation
- Project Plan
-
Week 1 11/4
- Clean the data
- Impute the missing data
- Visualize the data -
Week 2 11/11
- Work on mimimizing the loss on the training set using different regression methods -
Week 3 11/18
- Work on minimizing the loss on the test set using different regression methods
- Use different cross-validation methods to select any needed parameters
- Research different deep learning methods -
Week 4 11/25
- Use a Neural Network (deep learning) -
Week 5 12/2
- Work on Presentation
-