CoViD-19 Data Analysis

Some data analysis in python around the covid-19 data (including survival analysis with Kaplan-Meier).

Data Sources:

The first notebook just visualizes the numbers on a daily basis for a few regions: covid-19-data-analysis.ipynb

The second notebook does some survival analysis via Kaplan-Meier: covid-19-survival.ipynb. This was especially useful at the start of the infection wave when statistics were sparse, but now that data gets abundant the crude CFR as provided by dividing the death by the total number of know cases is already a very good estimate.

The idea for the third notebook is from Markus Noga's notebooks: covid19-analysis. I am fitting a sigmoid and an exponential function to the data (via PyMC3) for Austria and Germany and then perform a model comparison to see which of the two models works better on the data. In both cases the sigmoid model is more likely and today (2020-04-01) the model says that the inflexion point was on 2020-03-26: covid-19-data-analysis-forecasting.ipynb. If the model is right we should see a max in Austria of 15'000 cases and in Germany a max of 85'000 cases.
You can find an animated version of how well this prediction model works for Germany here: SARS-CoV-2 Fälle und Prognosemodell

Read more about the background in the associated blog post here: CoViD-19 Data Analysis.

Note / Caveat

It seems that the ?flush_cache=true flag for jupyter nbviewer does not work any longer. Therefore by clicking on the above links you may get outdated results. You can easily spot this if the last updated: at the very top of the notebook is older than the last commit (just compare this value to the values you see when clicking on the github links below).

To be sure you get the latest versions you can either look at the notebooks on github (which are not so pretty):

Or you checkout the repository and run the notebooks locally on your machine.