/covid19-global

Kaggle COVID-19 global forecasting challenge

Primary LanguageJupyter Notebook

Kaggle Challenge: CVOID-19 Global Forecasting

Please Visit the Kaggle Page for COVID 19 Global Forecasting for more information about the contest.

Some useful Kaggle discussion:

Getting started

Setting up environment

Install latest Anaconda Distribution

conda remove -y --name kaggle-covid --all
conda create -n kaggle-covid -y python=3.7 pandas numpy scipy statsmodels scikit-learn matplotlib seaborn ipykernel
conda activate kaggle-covid
conda install -y -c pyviz holoviews bokeh
pip install autopep8
ipython kernel install --user --name=kaggle-covid
conda deactivate
conda activate base
jupyter notebook --notebook-dir="./" --NotebookApp.port=8888

Kaggle Competition Rules

The Competition Rules can be found in the link below. It is important to know the competition permits the use of both competition and external data. As described in the rules section on the Kaggle competition page, the external data has to be published on the official competition forum before the entry deadline. As for this particular challenge, the entry deadline is on the same day as the submission deadline, which is on March 25th. We don’t have to rush for the first-week entry. However, please keep in mind any external data we found or generated must be posted before the deadline. Also, if you decided to contribute to the competition, make sure you join the team before March 25th for the first-week competition.

Kaggle docs: https://www.kaggle.com/docs

Datasets

Completed datasets

Uncollected / Unfinished datsets

  • Social Media
  • Policies
    • ideally, we would know how much people are going out?
  • Treatment Options
    • Vaccine trial stages

*All datasets from Kaggle are found on the sharing datasets public discussion board. All "private" external datasets should be entered on that discussion board.

Relevant Readings

Papers

Guides

"Defining the model to predict the difference in values between time steps rather than the value itself, is a much stronger test of the models predictive powers."

Our methods

Methods are preliminary, final approach will depend on how models perform.

Clustered Time Series

  1. Clustering cities and countries
    • type of response
    • healthcare system
    • transportation (air travel)
    • population density
    • etc...
  2. Train time-series models by cluster
    • train on difference in value
    • evaluate against historical mean
    • this help us separate cities/countries with different environment and response
  3. Combine model predictions

LSTM Neural Network

TBD

Hybrid Gradient Boosting Tree and LSTM

TBD

Spatio-Temporal Neural Networks