This project provides the means of building Jupyter stack images for Docker which can run correctly in a secured Docker environment.
The Jupyter Docker images do already run as a non root
user, however how that is done is not sufficient for a secured Docker environment which takes all steps to ensure you cannot run as root
, or somehow broach security of other running applications in a multi tenant hosting environment for Docker images.
The problems with the Jupyter images are as follows:
- Jupyter images set
USER
to a named used rather than an integer UID. This means that the hosting service cannot properly verify that the user the container runs as, will not actually be running asroot
. This is because an image could have added the named user but given it theroot
UID of0
. It would not be possible to detect this from looking at the Docker image meta data. In a secured Docker environment,USER
of any images should always use an integer UID. - Jupyter images will not run if the hosting environment overrides the UID that the container run as to a value different to that specified by the
USER
. This is because directories/files have ownership and permissions which prohibit the assigned user from reading or writing to them. To work in such an environment, all directories/files created should have ownership of the GID for theroot
group, this being the default GID used by Docker when containers are run. TheHOME
environment variable should also be set explicitly to deal with the case that the assigned UID when run does not have a corresponding UNIX account.
Although substitute images are provided here, this is seen as an interim measure. Ideally the original Jupyter images can be modified to work correctly.
The Docker build files provided are a small shim on top of the official Jupyter stack images. All the Dockerfile
does for each image is:
- Set group ownership for directories/files under
/home/jovyan
to theroot
user. - Set group ownership for directories/files under
/opt/conda
to theroot
user. - Ensure that directories under
/home/jovyan
are accessible/writable to theroot
group. - Ensure that directories under
/opt/conda
are accessible/writable to theroot
group. - Ensure that files under
/home/jovyan
are readable/writable to theroot
group. - Ensure that files under
/opt/conda
are readable/writable to theroot
group. - Set the
HOME
environment variable to/home/jovyan
.
In addition to these fixes to the Jupyter stack images, the new derived image also sets image labels to enable the images to be used as Source to Image (S2I) builders. Such builders can be used in conjunction with the s2i
tool to build Docker images which combine the base images with files from a Git repository, without a user needing to know how to write a Dockerfile
themselves. This makes it very easy to bundle up a base image with a set of notebooks into an image for distribution or deployment, such as in a teaching environment. The S2I function also works with OpenShift 3, enabling one click deployment of Jupyter images along with any required notebooks.
The list of Jupyter images for which fixed up variants are provided for are:
Select which image you require for the work you need to do. Then run the OpenShift command line tool command oc new-build
with this repository, the context directory set to the image name, and an output image name.
It is important to the output image names listed so they match the application templates and don't clash with the name of the original Jupyter image.
The commands to build each of the following would therefore be as follows:
all-spark-notebook
oc new-build https://github.com/GrahamDumpleton/openshift3-jupyter-stacks.git --name all-spark-notebook-img --context-dir=all-spark-notebook
datascience-notebook
oc new-build https://github.com/GrahamDumpleton/openshift3-jupyter-stacks.git --name datascience-notebook-img --context-dir=datascience-notebook
minimal-notebook
oc new-build https://github.com/GrahamDumpleton/openshift3-jupyter-stacks.git --name minimal-notebook-img --context-dir=minimal-notebook
pyspark-notebook
oc new-build https://github.com/GrahamDumpleton/openshift3-jupyter-stacks.git --name pyspark-notebook-img --context-dir=pyspark-notebook
r-notebook
oc new-build https://github.com/GrahamDumpleton/openshift3-jupyter-stacks.git --name r-notebook-img --context-dir=r-notebook
scipy-notebook
oc new-build https://github.com/GrahamDumpleton/openshift3-jupyter-stacks.git --name scipy-notebook-img --context-dir=scipy-notebook
If you only need an empty Jupyter Notebook instance and will upload any notebooks manually, you can deploy the image directly. If for example needing the minimal-notebook
, you would run:
oc new-app minimal-notebook-img --name my-notebook
oc expose service my-notebook
By default the service will be exposed via HTTP and will not be password protected. It is recommended you modify the route to enable TLS edge termination.
To enable a password, instead of the commands above, instead use:
oc new-app minimal-notebook-img --name my-notebook --env PASSWORD=mypassword
oc expose service my-notebook
To have a set of notebooks and other data files combined with the image and deployed in one step, run the image as an S2I builder. That is, image name followed by ~
and the Git repository URL containing the notebooks.
oc new-app minimal-notebook-img~https://github.com/jrjohansson/scientific-python-lectures.git --name my-notebook
oc expose service my-notebook
As well as being able to deploy instances of the Jupyter Notebooks from the command line using the above commands, application templates are also provided. Once loaded these can be used from the OpenShift web console or the command line.
These templates will automatically provision a password and also enabled a secure HTTPS route for accessing the instance.
To load the application templates for the image you are interested in, use the appropriate command below.
all-spark-notebook
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/openshift3-jupyter-stacks/master/openshift-templates/all-spark-notebook.json
datascience-notebook
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/openshift3-jupyter-stacks/master/openshift-templates/datascience-notebook.json
minimal-notebook
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/openshift3-jupyter-stacks/master/openshift-templates/minimal-notebook.json
pyspark-notebook
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/openshift3-jupyter-stacks/master/openshift-templates/pyspark-notebook.json
r-notebook
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/openshift3-jupyter-stacks/master/openshift-templates/r-notebook.json
scipy-notebook
oc create -f https://raw.githubusercontent.com/GrahamDumpleton/openshift3-jupyter-stacks/master/openshift-templates/scipy-notebook.json
To use the templates, once loaded, from the web console, use Add to Project within the project you want to use.
To use the templates from the command line, determine the names using oc get templates
.
$ oc get templates
NAME DESCRIPTION PARAMETERS OBJECTS
minimal-notebook-app Jupyter (minimal-notebook). 4 (3 blank) 5
You can see the parameters using the oc describe
command.
$ oc describe template minimal-notebook-app
Name: minimal-notebook-app
Created: 15 minutes ago
Labels: <none>
Description: Jupyter (minimal-notebook).
Annotations: iconClass=instant-app
tags=instant-app
Parameters:
Name: APPLICATION_NAME
Display Name: Name
Description: Identifies the resources created for this application.
Required: true
Value: <none>
Name: REPOSITORY_URL
Display Name: Git Repository URL
Description: Repository for your Jupyter Notebooks and data.
Required: true
Value: <none>
Name: USER_PASSWORD
Display Name: User Password
Description: User password for accessing Jupyter Notebook.
Required: true
Generated: expression
From: [a-zA-Z0-9]{8}
Name: ROUTE_HOSTNAME
Display Name: Hostname
Description: Public hostname for the route. If not specified, a hostname is generated.
Required: false
Value: <none>
Object Labels: <none>
Objects:
ImageStream ${APPLICATION_NAME}
BuildConfig ${APPLICATION_NAME}
DeploymentConfig ${APPLICATION_NAME}
Service ${APPLICATION_NAME}
Route ${APPLICATION_NAME}
The oc new-app
command can then be used to create and instance of the Jupyter Notebook for that image.
$ oc new-app minimal-notebook-app --param APPLICATION_NAME=my-notebook --param REPOSITORY_URL=https://github.com/jrjohansson/scientific-python-lectures.git
--> Deploying template minimal-notebook-app for "minimal-notebook-app"
With parameters:
Name=my-notebook
Git Repository URL=https://github.com/jrjohansson/scientific-python-lectures.git
User Password=XkenopKc # generated
Hostname=
--> Creating resources with label app=my-notebook ...
imagestream "my-notebook" created
buildconfig "my-notebook" created
deploymentconfig "my-notebook" created
service "my-notebook" created
route "my-notebook" created
--> Success
Build scheduled, use 'oc logs -f bc/my-notebook' to track its progress.
Run 'oc status' to view your app.
When a password is not specified, you can see the auto generated password in the output of oc new-app
. In the web console you can see it in the environment variables associated with the deployment configuration.
By default any notebooks are not persistent. If the pod for the instance is destroyed, any work will be lost.
If you wish for data to be persistent, you will need to make a persistent volume claim and then mount the volume into the instance. The volume should be mounted on the directory /home/jovyvan/work
.
If you already have an empty Jupyter Notebook instance running, you can use the following command to claim a persistent volume and mount it.
oc set volume dc/my-notebook --add --name=my-notebooks -t pvc --claim-size=1G --overwrite --mount-path=/home/jovyan/work