Your data science lab, in a box

This repository contains two example deployments of a multi-user isolated environments using Jupyterhub. It is aimed towards small research or data science teams.

The first one authenticates users using GitHub OAuth.

The second one also contains a self-hosted GitLab instance, which can be used for authentication and every else (e.g. CI/CD and docker registry). It also contains an Nginx service as a reverse proxy

Although these deployments have been tested on a single machine, it can be scaled to multiple nodes using swarm (see jupyterhub/dockerspawner#216).

Note that this is not meant as a guide or complete tutorial. If you want to learn more about Jupyter(hub)'s architecture and configuration options, check out:

What's Jupyter?

Most people associate the Jupyter project (formerly known as ipython server) to the notebooks. But it is way more than that: it is FANTASTIC project and community! It includes many actively developed open source projects that go way beyond the original idea of notebooks and kernels. Moreover, most of these projects are cloud-oriented. Just to name a few:

In this repository we set up jupyterhub, which extends jupyter by providing multi-user support, authentication and different isolation/deployment options.

Requirements

  • Docker
  • Docker-compose
  • Docker-machine (recommended)

Setup

  • Create a machine
  • Add SSH key
  • Configure a DNS wildcard for your domain (if you don't own a domain, check out http://nip.io/ or http://xip.io)
  • For convenience, change the SSH port to something other than 22 (e.g. 2222):
vi /etc/sshd_config
systemctl restart sshd
  • Install docker. The easiest way is to use docker-machine:
docker-machine create --driver generic --generic-ip-address=lab.todevnull.com  --generic-ssh-key ~/.ssh/id_rsa --generic-ssh-port 2222  labinabox
  • Set up your environment to start using the remote docker:
eval $(docker-machine env labinabox) 
docker info
  • The docker spawner does not fetch the single-user image automatically, so you will have to pull it manually:
docker pull jupyter/scipy-notebook:latest
  • Create a folder for user homes (workspaces) and give the docker image write permissions:
docker-machine ssh labinabox 'mkdir /mnt/home'
docker-machine ssh labinabox 'chown -R 1000:100 -R /mnt/home'

SSL

This demo assumes you have a valid certificate (/etc/ssl/ssl-custom/cert.pem) and a key (/etc/ssl/ssl-custom/key.pem) for your domain.

Certbot

You're encouraged to use a valid certificate authority such as letsencrypt. Using certbot is pretty straightforward. It even comes bundled in a docker image, and a standalone server:

LE_VERSION=v0.14.0
DOMAIN=todevnull.com
docker run -ti --rm -p 80:80 -p 443:443 --name certbot \
    -v '/data/letsencrypt/etc/letsencrypt/:/etc/letsencrypt' \
    -v '/data/letsencrypt/var/lib/letsencrypt:/var/lib/letsencrypt' \
    -v '/var/www/letsencrypt/:/webroot' \
    certbot/certbot:$LE_VERSION certonly --standalone \
    --expand --keep \
    -d hub.$DOMAIN -d lab.$DOMAIN -d registry.$DOMAIN -d github.$DOMAIN -d chat.$DOMAIN -d github.$DOMAIN

Now, simply move the generated certificates to the paths the demos expect:

docker-machine ssh labinabox "cp -L /data/letsencrypt/etc/letsencrypt/live/hub.$DOMAIN/privkey.pem /etc/ssl/ssl-custom/key.pem"
docker-machine ssh labinabox "cp -L /data/letsencrypt/etc/letsencrypt/live/hub.$DOMAIN/fullchain.pem /etc/ssl/ssl-custom/cert.pem"

Self-signed

For a simple test, you can also generate your own self-signed certificates using openssl:

export DOMAIN=<YOUR DOMAIN NAME>
openssl req -x509 -newkey rsa:4096 -keyout ssl-custom/key.pem -out ssl-custom/cert.pem -days 365 -subj "/C=ES/ST=Madrid/L=Madrid/O=Lab in a Box/OU=Org/CN=*.${DOMAIN}"

docker-machine scp -r ssl-custom labinabox:/etc/ssl/

Notes

  • Instead of creating a custom image, nginx should rely on the vanilla nginx docker image with configuration as a bind mount, but that requires syncing configuration files with the server.
  • Do not even consider deploying an environment like the one in this demo without a backup strategy: http://www.taobackup.com/
  • Folder permissions should be more restrictive. You can chown the files to the default uid and gid of the jupyter image.