/singularity-container-tutorial

A tutorial on how to use Singularity containers for replicable science.

Singularity containers

Welcome to the Singularity tutorial repo. Here, you can find simple examples to get you started and a few simple use cases. The slides are also available.

A toy example

Assuming you have the best decision tree learner there is, in a form of a simple Python codebase. Assuming your friend, Bob, wishes to run the algorithm on his hardware stack which is a bit different to what you have.

Theoretically, being super strict and all, you've versioned all libraries (dependencies), however, is that enough?

Enter Singularity

Singularity containers are the scientific twin brother of the widely used docker containers. More suitable for fast prototyping, these containers offer a simple-to-use solution to create persistent environments where your code can seamlessly run. To help Bob, you decide to:

  1. Teach him how to install the Singularity environment on a given machine: this is a must-have
  2. Teach him how to run a pre-compiled Singularity image.
  3. Teach him how to compile his own containers
  4. Teach him how to perform basic image manipulations

Of course, these are only some of the possible examples, but enough to make Bob the master of Singularity in his small realm.

Intermediary take home message. If possible, do the extra step of providing the .dsc file so people can properly replicate the key parts of the codebase.

Extra step - GPUs and HPC

Assuming Bob got the tree.py working on GPUs. How can he scale with Singularity?

Additional resources

  1. Sling tutorial slides

  2. Detailed tutorial by Barbara Krašovec

  3. First steps - BSC

  4. A bit more extensive tutorial.

Isn't this kind of Docker? Why Singularity?

The following are the main reasons for using Singularity for HPC (credit: Reddit).

  1. Singularity effectively runs as the running user and doesn’t result in elevated access. Docker doesn't and can be problematic from a security standpoint
  2. Singularity runs container processes without a daemon. They just run as child processes.
  3. Singularity is better for command line applications and accessing devices like GPUs or MPI hardware without jumping through hoops.
  4. Docker provides no built-in way to allow users to start containers that are certain to have limited access rights.
  5. Docker images are stored off in the local image cache, and you’re expected to interact with them using the docker image command, e.g. docker image ls. In contrast, Singularity images are just normal files on your filesystem. Now that we have a SIF file, we can run it.