/binderhub-deploy-gke

Automatically deploy a BinderHub to Google Cloud!

Primary LanguageShellMIT LicenseMIT

⚠️ This repo is no longer maintained, and hence archived ⚠️

Automatically deploy a BinderHub to Google Cloud

mit_license_badge Build and Push Docker image Lint Dockerfile Check Setup Lint YAML templates Validate terraform files All Contributors

BinderHub is a cloud-based, multi-server technology used for hosting repoducible computing environments and interactive Jupyter Notebooks built from code repositories.

This repository contains a set of scripts to automatically deploy a BinderHub onto Google Cloud and connect a Docker Hub account/organisation, so that you can host your own Binder service.

This repository is based on the "Deploy to Azure" repo alan-turing-institute/binderhub-deploy.

You will require a Google Cloud account and project. A Free Trial project can be obtained here. You will be asked to provide a credit card for verification purposes. You will not be charged. Your resources will be frozen once your trial expires, then deleted if you do not reactivate your account within a given time period. If you are building a BinderHub as a service for an organisation, your institution may already have a Google Cloud account.

Table of Contents:


🚸 Usage

To use these scripts locally, clone this repo and change into the directory.

git clone https://github.com/alan-turing-institute/binderhub-deploy-gke.git
cd binderhub-deploy-gke

To run a script, do the following:

./src/<script-name>.sh

To build the BinderHub, you should run setup.sh first (to install the required command line tools), then deploy.sh (which will build the BinderHub). Once the BinderHub is deployed, you can run logs.sh and info.sh to get the JupyterHub logs and service IP addresses respectively. teardown.sh should only be used to delete your BinderHub deployment.

You need to create a file called config.json which has the format described in the code block below. Fill the quotation marks with your desired namespaces, etc. config.json is git-ignored so sensitive information, such as passwords and Service Accounts, cannot not be pushed to GitHub.

  • For a list of available data centre regions and zones, see here. This should be something like us-central1 for a region and us-central1-a for a zone.
  • For a list of available Linux Virtual Machines, see here. This should be something like, for example n1-standard-2.
  • The versions of the BinderHub Helm Chart can be found here and are of the form 0.2.0-<commit-hash>. It is advised to select the most recent version unless you specifically require an older one.
{
  "binderhub": {
    "name": "",              // Name of your BinderHub
    "version": "",           // Helm chart version to deploy, should be 0.2.0-<commit-hash>
    "image_prefix": ""       // The prefix to preppend to Docker images (e.g. "binder-prod")
  },
  "docker": {
    "username": null,        // Docker username (can be supplied at runtime)
    "password": null,        // Docker password (can be supplied at runtime)
    "org": null              // A Docker Hub organisation to push images to (optional)
  },
  "gcp": {
    "email_account": "",     // Your Google Account Email address
    "project_id": "",        // The numerical ID of your Google Cloud project
    "credentials_file": "",  // Path to your Google Cloud Service Account credentials in JSON format
    "region": "",            // The region to deploy your BinderHub to
    "zone": ""               // The zone (within above region) to deploy your BinderHub to
  },
  "gke": {
      "node_count": 1,       // The number of nodes to deploy in the Kubernetes cluster (3 is recommended)
      "machine_type": ""     // The VM type to deploy in the Kubernetes cluster
  }
}

You can copy template-config.json should you require.

Please note that all entries in template-config.json must be surrounded by double quotation marks ("), with the exception of node_count or if the value is null.

🔑 Create a Service Account key

This script will access your Google Cloud account using a Service Account key. Create one now in the console using the following settings:

  1. Select the project you are going to use (in the blue bar along the top of the browser window).
  2. Under "Service account", select "New service account".
  3. Give it any name you like!
  4. For the Role, choose "Project -> Editor".
  5. Leave the "Key Type" as JSON.
  6. Click "Create" to create the key and save the key file to your system.

You will provide the path to this file under credentials_file in config.json described above.

🚨 The service account key file provides access to your Google cloud project. It should be treated like any other secret credential. Specifically, it should never be checked into source control. 🚨

🚦 setup.sh

This script checks whether the required command line tools are already installed. If any are missing, the script uses the system package manager or curl to install the command line interfaces (CLIs). The CLIs to be installed are:

Any dependencies that are not automatically installed by these packages will also be installed.

🚀 deploy.sh

This script reads in values from config.json and deploys a Kubernetes cluster. It then creates config.yaml and secret.yaml files which are used to install the BinderHub using the templates in the templates folder.

The script will ask for your Docker ID and password if you haven't supplied them in the config file. The ID is your Docker username, NOT the associated email. If you have provided a Docker organisation in config.json, then Docker ID MUST be a member of this organisation.

Both a JupyterHub and BinderHub are installed via a Helm Chart onto the deployed Kubernetes cluster and the config.yaml file is updated with the JupyterHub IP address.

config.yaml and secret.yaml are both git-ignored so that secrets cannot be pushed back to GitHub.

The script also outputs log files (<file-name>.log) for each stage of the deployment. These files are also git-ignored.

📊 logs.sh

This script will print the JupyterHub logs to the terminal to assist with debugging issues with the BinderHub. It reads from config.json in order to get the BinderHub name.

ℹ️ info.sh

This script will print the pod status of the Kubernetes cluster and the IP addresses of both the JupyterHub and BinderHub to the terminal. It reads the BinderHub name from config.json.

⬆️ upgrade.sh

This script will automatically upgrade the Helm Chart deployment configuring the BinderHub and then prints the Kubernetes pods. It reads the BinderHub name and Helm Chart version from config.json.

💥 teardown.sh

This script will run the terraform destroy -auto-approve command to destroy all deployed resources. It will read the terraform.tfstate (which will be git-ignored) file under terraform directory. The user should check the Google Cloud Console to verify the resources have been deleted.

🏡 Running the Container Locally

Another way to deploy BinderHub to Google Cloud would be to pull the Docker image and run it directly, parsing the values you would have entered in config.json as environment variables.

You will need the Docker CLI installed. Installation instructions can be found here.

⬆️ Updating your Service Account

To deploy the BinderHub without your local authentication details, we need to grant an extra role to the Service Account you created in "Create a Service Account key".

  1. On the IAM page of the Google Cloud console, edit the Service Account you created. Do this by selecting the pencil icon to the right of the account.
  2. Select "+ Add Another Role"
  3. Search for and add the "Kubernetes Engine Admin" role
  4. Click "Save"

🐳 Running the Container

First, pull the binderhub-setup-gke image.

docker pull sgibson91/binderhub-setup-gke:<TAG>

where <TAG> is your chosen image tag.

A list of availabe tags can be found here. It is recommended to use the most recent version number. The latest tag is the most recent build from the default branch and may be subject fluctuations.

Then, run the container with the following arguments, replacing the <> fields as necessary:

docker run \
-e "CONTAINER_MODE=true" \  # Required
-e "BINDERHUB_NAME=<Chosen BinderHub Name>" \  # Required
-e "BINDERHUB_VERSION=<Chosen BinderHub Version>" \  # Required
-d "DOCKER_ORG=<Docker Hub Organisation>" \  # Optional
-e "DOCKER_USERNAME=<DOCKER ID>" \  # Required
-e "DOCKER_PASSWORD=<Docker Password>" \  # Required
-e "GCP_ACCOUNT_EMAIL=<Google Email Account>" \  # Required
-e "GCP_PROJECT_ID=<Google Project ID>" \  # Required
-e "GCP_REGION=<Google Cloud Region>" \  # Required
-e "GCP_ZONE=<Google Cloud Zone>" \  # Required
-e "GKE_NODE_COUNT=3" \  # Required
-e "GKE_MACHINE_TYPE=n1-standard-2" \  # Required
-e "IMAGE_PREFIX=binder-dev" \  # Required
-v <Path to Service Account key file>:/app/key_file.json \  # Required
-it sgibson91/binderhub-setup-gke:<TAG>

The output will be printed to your terminal and the files will be pushed to a storage bucket. See the Retrieving Deployment Output section for how to return these files.

📦 Retrieving Deployment Output

When BinderHub is deployed using a local container, output logs, YAML files, and the terraform state file are pushed to a Google storage bucket to preserve them once the container exits. The storage bucket is created in the same project as the Kubernetes cluster.

The storage bucket name is derived from the name you gave to your BinderHub instance, but may be modified and/or have a random seed appended. The Google Cloud CLI can be used to find the bucket and download it's contents. It can be installed by running the setup.sh script.

To find the storage bucket name, run the following command.

gsutil ls

To download all files from the bucket:

gsutil -m cp -r gs://${STORAGE_BUCKET_NAME} ./

Make sure the terraform state file is moved to the terraform folder!

mv ${STORAGE_BUCKET_NAME}/terraform.tfstate ./terraform

For full documentation, see "Cloud Storage: Downloading Objects".

🔓 Accessing your BinderHub after Deployment

Once the deployment has succeeded and you've downloaded the log files, visit the IP address of your Binder page to test it's working.

The Binder IP address can be found by running the following:

cat binder-ip.log

A good repository to test your BinderHub with is binder-examples/requirements

🎨 Customising your BinderHub Deployment

Customising your BinderHub deployment is as simple as editing config.yaml and/or secret.yaml and then upgrading the BinderHub Helm Chart. The Helm Chart can be upgraded by running upgrade.sh (make sure you have the CLIs installed by running setup.sh first).

The Jupyter guide to customising the underlying JupyterHub can be found here.

The BinderHub guide for changing the landing page logo can be found here.

✨ Contributors

Thanks goes to these wonderful people (emoji key):


Sarah Gibson

🐛 💻 📖 🤔 🚇 🚧 📦 📆 ⚠️ 🔧

Min RK

💻 🤔 🔧

This project follows the all-contributors specification. Contributions of any kind welcome!