Data Science Deployments

These are recipes for deploying data science environments that provide differing capabilities of compute. We use nixops to reproducibly declare the machine(s) configuration. The common directory is used to show similar configuration can be shared between deployments.

In nix since each kernel is an encapsulated kernel environment we should use the name kernel and environment interchangeably. Meaning that creating a new environment for users is the same as creating an additional kernel. Conda has taken a similar approach with nb_conda_kernels.

This deployment is opinionated. We provide a consistent environment across multiple architectures.

  • provide many customizable kernel environments: Python(2, 3.7, 3.8), C, rust, R, Ansible, Nix, Bash, Ruby
  • user configuration common/users.nix
  • jupyterhub
    simple
    PAM authentication, SystemD spawner
    slurm
    PAM authentication, Batchspawner(slurm) spawner

Usage

In general nixops has a few useful commands. In addition we are using libvirt for testing but nixops is also capable of cloud deployments. We first create the deployment.

nixops create -d <deployment-name> <path to deployment.nix>

We can view our deployment.

nixops list

Next we deploy

nixops deploy -d <deployment-name>

Finally we can access information about the deployment

nixops info -d <deployment-name>

Deployment

Simple

The simple deployment is a single node deployment of jupyterhub with many kernels available in kernels.nix. Deployment takes around 2 minutes.

Create

nixops create -d simple simple/deployment.nix

Deploy

nixops deploy -d simple

Info

nixops info -d simple

Slurm

Multi-node deployment of jupyterhub with identical development environment to the simple deployment. Uses slurm for the distribution of user jobs. Deployment takes around 5 minutes.

Create

nixops create -d slurm slurm/deployment.nix

Deploy

nixops deploy -d slurm

Info

nixops info -d slurm

Issues to upstream

  • dask-gateway-scheduler, dask-gateway-worker now no longer being default (which is awesome)
  • nfs shared home directory between users is a requirement that is not documented for jobqueue
  • dashboard does not show up at the moment dask/distributed#3741

Nomad

Create

nixops create -d nomad nomad/deployment.nix

Deploy

nixops deploy -d nomad

Info

nixops info -d nomad