/slither

Static Python Builds for HPC Systems

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Slither: Static Python builds for HPC Systems

Slither is a set of patches for Python (and related modules) and a command line tool for building static CPython binaries. In addition slither supports byte-compiling Python module sources into frozen modules.

In the most optimized configuration, Slither is capable of producing binaries which are entirely self-contained and strictly minimize the number of file system calls (i.e. stat and open).

Rationale

While newer HPC systems do support dynamic libraries, the support remains unoptimized. Locating and loading each shared object can result in a lot of file system contention, especially for larger jobs involving thousands of processors. This contention is also present in importing regular Python (.py) modules. Depending on the Python configuration, each imported module can require ten or more stat system calls.

This contention manifests itself as a very long start up time --- the amount of time required for the Python interpreter to start and the prerequisite modules to be imported. On one node, this is often under a minute and is generally ignorable. On a hundred nodes this can take hours. See Shared Library Performance on Hopper (slides) by Z. Zhao et al. and Python in a Parallel Environment by D. Grote.

Previous Work

The GPAW package static python builds provided instructions and patches for static Python builds, which this project builds upon.

Method of Operation

This project is composed of the following pieces.

  1. Build a custom Python 2.7.3 interpreter which contains a patched version of distutils.

  2. Use the custom interpreter to build and install Python modules. Most modules can be built with little or no additional configuration.

  3. Use the bin/slither script to build custom, statically linked, Python interpreters which bake in Python module byte-code.

Getting Started

Please see the INSTALL.md for installation instructions.

Alternative Approaches

Some tools have been developed to speed up Python startup times. For example, NERSC provides the DLcache and FMcache tools on Hopper.

DLcache is a general purpose tool to optimize the importing of shared objects (i.e., dlopen). FMcache is a more specialized tool to optimize the importing of Python modules (.py and .pyc files). In each case a cache is built by executing the program on a small number of MPI ranks. The cache is then distributed and the job run on a large number of MPI ranks. The main drawbacks are the following:

  • It must be easy to run the same job (or at least the same beginning import statements) at small width. Often this means creating a separate .py script which just imports the modules you will be using.

  • The libraries are loaded in the same order on every rank. In practice this means loading every module you might use at the beginning of your script.

  • The batch job script must be altered with several pre- and post-processing steps.

Resources