/backblaze-survival

Survival analysis of Backblaze hard drives

Primary LanguageJupyter Notebook

Survival Analysis: Backblaze Hard Drives

This is the code for my survival analysis of the Backblaze hard drive dataset.

To run the notebook, you will need to start JupyterLab in a special way that gives it access to the spark global variable of PySpark. The command for this is in the launch-notebook.sh shell script.

This requires Apache Spark (and the PySpark binding), which can be installed via Homebrew or Conda. I also use the pandas, lifelines, and humanize packages from PyPI, installed via pip.

Data files are not included. These can be obtained from Backblaze's website. Place downloaded files in a data subfolder and rename them using the convention described in the notebook.