/corn

Base image for the NASA-Openscapes Jupyterhub

Primary LanguageShellMIT LicenseMIT

corn 🌽

Jupyterhub base image for the NASA Openscapes Hub

Overview

This project allows the provisioning of a multi-kernel Docker base image for Jupyterhub deployments.

In collaborative efforts -like this NASA hackathon- there are multiple teams working on different stacks and we often run into situations where Team A will need to use Python 3.8 with say xarray v0.14 and Team B may need Python 3.9 and xarray v0.17. A simple solution would be to reconcile these 2 environments so both teams can run their code. However, this is not always straight forward or even possible. Therefore having a multi kernel base image for Jupyterhub deployments makes a lot of sense.

corn uses the amazing Pangeo's base image, installs all the environments it finds under ci/environments and makes them available as kernels in the base image so users can select which kernel to use depending on their needs. The only requirement to add kernels is to use a conda environment.yml file (pip dependencies can be included in environment.yml) and a name file.

  • environment.yml: conda environment file
  • name.txt: the name for the environment, it can be the same as the one used in the environment file

Adding a new kernel

To add a new kernel we need to create a new folder under ci/environments/ and add the 2 files described above. Say we want to run our amazing new notebook that uses pandas and python 3.10.

We will need a conda environment file environment.yml

name: amazing-env
channels:
  - conda-forge
dependencies:
  - python="3.10"
  - pandas>=1.3
  - pip

and our name.txt file

amazing-env

That's it!

Note: if you have pip installable dependencies, they must be listed using a requirements.txt file.

Updating quarto

To update the quarto installation you'll need to change the version number in corn's Dockerfile. After committing changes, the GitHub Action will begin - see next.

Updating the image in the JupyterHub

After we commit our changes to the main branch of this repo, the GitHub Action build will be triggered. Then, the Github Action will push the resulting Docker image to dockerhub, creating an image tagged with the commit hash. This can take ~20 minutes.

You can try this newly created image by using the "Bring your own image" functionality in the JupyterHub. Specify the image with the name openscapes/python:$TAG, where $TAG is the tag of the Dockerhub image (which is the same as the commit hash). You can copy the name from the docker pull command shown in Dockerhub.

Once you've verified it is working the way you want, we need to update the python image in our Jupyterhub configuration. The quickest way to do this is to create a pull request here, updating openscapes/python:$TAG, with the tag/commit hash. For 2i2c deployments there is a GUI that allows administrators to do it.

Then, you'll go to https://openscapes.2i2c.cloud/hub/home > Stop My Server (or File > Log Out) to stop your server and restart it. Then the Docker image should be updated.

Note: Looks like 2i2c caches the user image so tags like main won't be updated even if they have changes. Using the actual commit hash is a better practice for now.

Testing changes to the image locally

If you want to test your changes locally (i.e., without building in GitHub actions and pushing to Dockerhub), you can do so on your own computer using Docker:

  1. Install Docker Desktop
  2. Make sure Docker is running, then build the image with:
# make sure you are in the "ci" directory
cd ci
docker build -t openscapes/corn:test . --platform linux/amd64

The --platform linux/amd64 flag is only necessary if you are not on a machine with an x86-64 chip architecture (e.g., an M1 or M2 Mac, which have an ARM-based architecture).

Once the image has been built, you can run it with:

docker run -p 8888:8888 --platform linux/amd64 openscapes/corn:test jupyter lab --ip=0.0.0.0

If a browser doesn't automatically open, you can open one of the links that is generated in the output. It will look something like:

http://127.0.0.1:8888/lab?token=a74663dba15a5e5cab52ef4bd6a9346034fd1ab927f6a29b

Note that the home directory (/home/jovyan) will look different than you are used to in the Hub. This is because in the local image the home directory still contains artifacts from the image building process, while in the Hub a shared AWS NFS drive is mounted to home/jovyan, giving you access to your persistent home directory in the Hub.

What's next?

This is a effective but probably inefficient way of building environments, exploring staged partial builds in Docker or using conda-store to build each environment and then pulling them into a Docker image may be more efficient.

The final size of the image depends on the dependencies for each environment, thus avoiding multiple Python versions is still recommended.