Papermill demo
This repository contains a demo of Papermill running on NeSI.
The goal is to demonstrate how to use it to run non-interactively a series of notebooks on an HPC plateform.
Installation
From a terminal running on NeSI, clone this repository:
git clone https://github.com/nesi/papermill_demo
Create a Conda environment:
module purge && module load Miniconda3/4.10.3
conda env create -p venv -f environment.lock.yml
Install a jupyter kernel for the notebooks:
module purge && module load JupyterLab
nesi-add-kernel -p ./venv papermill_demo cuDNN/8.1.1.33-CUDA-11.2.0
Note: The environment.lock.yml
file contains the version of all installed packages for reproducibility.
It has been generated from a conda environment created using the environment.yml
file and exported it as follows:
conda env export -p venv --no-builds | sed "/^name/d; /^prefix/d" > environment.lock.yml
Demo
This repository contains 3 notebooks:
- preprocessing.ipynb downloads the MNIST dataset and split it in train/test sets,
- model_fitting.ipynb fits a simple MLP model on the prepared MNIST dataset.
- keras_model.ipynb fits a MLP model on the MNIST dataset using Keras and Keras Tuner.
The following example will illustrate how to run the first 2 notebooks non-interactively and with different parameters.
Activate the virtual environment:
module purge && module load Miniconda3/4.10.3
source $(conda info --base)/etc/profile.d/conda.sh
conda activate ./venv
First, let's run the preprocessing notebook, saving the dataset in the results
folder:
papermill -k papermill_demo -p result_file results/dataset.npz \
notebooks/preprocessing.ipynb results/preprocessing.ipynb
The -p result_file results/dataset.npz
option injects this parameter in the notebook.
Note that the -k papermill_demo
option sets the jupyter kernel to run the notebook.
Next, we can run the model fitting model with a set of parameters injected from a file, using the -f
option:
papermill -k papermill_demo \
-p input_file results/dataset.npz \
-f config/short_run.yaml \
notebooks/model_fitting.ipynb \
results/model_fitting_short.ipynb
To run it on a larger node, we can use a slurm script to request the relevant resources:
sbatch slurm/fit_long_run.sl
and check if the job is running using squeue
:
squeue -u "$USER"
Finally, let's combine it with Snakemake to get a workflow using Slurm jobs and running multiple configurations:
snakemake --profile nesi