/rcs-pacemakers

Example of GPU-accelerated machine learning on the RCS compute cluster using Jupyter and conda

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Pacemakers

These instructions describe how to run a lightly-modified version of the Pacemakers Kaggle kernel on Imperial College's HPC system.

Platforms such as Kaggle, Colab and Azure Notebooks are great for sharing notebooks but there are advantages to using the RCS Compute Service for your research:

  • Your data remains inside the College, via the RDS
  • You can run long, non-interactive and/or parallel jobs (see below)
  • You have access to multi-GPU nodes and several models of GPU (details)

Setup

  1. Clone or download this repository to the HPC system
  2. Download the data to Train and Test folders in the same directory
  3. Create a conda environment with the required dependencies: conda env create --file environment.yml

Execute

To run the notebook in Jupyter (P1000):

  1. Visit the RCS Jupyter Service
  2. Create a new server (GPU recommended)
  3. Open pacemakers.ipynb and run the notebook

To run the notebook as a job (P100):

  1. qsub pacemakers.pbs.sh
  2. On job completion visit the RCS Jupyter Service
  3. Open pacemakers.ipynb and review the outputs

Multi-GPU execution

This repository has two branches. The master branch (the default) targets a single GPU. The multi-gpu branch uses DataParallel to target two GPUs. You can see the relevant modifications by comparing the branches.

Acknowledgements

Many thanks to James Howard for sharing his work and reviewing these instructions.