/hi-perf-ipynb

An introduction to high-performance Python using Jupyter

Primary LanguageJupyter NotebookBSD 4-Clause "Original" or "Old" LicenseBSD-4-Clause

An introduction to high-performance Python using Jupyter

Will Furnass, University of Sheffield's Research Software Engineering team, 2018.

A brief guide to parallel programming using Python and the Jupyter Notebook.

Aims

The main aim is to equip the researcher who already has some knowledge of the Python scientific computing stack with an understanding of relevant conceptual approaches to parallel programming and of practical approaches to realising those concepts using popular Python packages.

A secondary aim is to demonstrate the potential of JupyterHub on Grid Engine computer clusters as a high-performance Python programming environment for those with limited knowledge of Linux and the Unix shell. JupyterHub has been deployed on the University of Sheffield's ShARC cluster using funding from the OpenDreamKit project (see Acknowledgements). This teaching material was designed to be used on ShARC but is also relevant to other environments where Jupyter and sufficient hardware resources are available.

Learning outcomes

  • Understanding of why parallelisation is of increasing importance given the death of Moore's Law
  • Understanding of the different types of parallelism and their merits
  • Basic understanding of theoretical speedups and Amdahl's Law
  • Understaning of communication vs computation costs
  • Ability to identify and use libraries that can distribute non-Python work between threads
  • Ability to distribute Python work between processes using multiprocessing

Lessons

  1. Parallelisation using packages that support multithreading
  2. Parallelising your own code using multiple Python processes on a single machine

Further reading

  • Wilkinson, B. and Allen, M. (1999). Parallel programming: techniques and applications using networked workstations and parallel computers. Prentice Hall, Upper Saddle River, N.J. ISBN: 0-13-671710-1
  • Gorelick, M. and Ozsvald, I. (2014). High performance Python, First edition. ed. O’Reilly, Sebastopol, CA. ISBN: 978-1-4493-6159-4
  • Using JupyterHub on the University of Sheffield's ShARC cluster

Acknowledgements

The development of this material was funded by OpenDreamKit, a Horizon2020 European Research Infrastructure project (676541) that aims to advance the open source computational mathematics ecosystem.

OpenDreamKit logo RSE Sheffield logo