/docker-jupyter

Dockerfile for running example Senzing Jupyter notebooks.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

test-docker-jupyter

Overview

The docker-jupyter repository holds example Senzing Jupyter notebooks in the notebooks subdirectory.

The senzing/jupyter docker image is a Senzing-ready image hosting the example Senzing notebooks.

These notebooks are built upon the DockerHub Jupyter organization docker images. The default base image is jupyter/minimal-notebook. There is more information on the Jupyter Docker Stacks.

In addition, the Jupyter notebooks can be viewed on nbviewer.jupyter.org. For example, visit Senzing examples on NbViewer.

Related artifacts

  1. DockerHub

Contents

  1. Expectations
    1. Space
    2. Time
    3. Background knowledge
  2. Demonstrate using Docker
    1. Initialize Senzing
    2. Configuration
    3. Volumes
    4. Docker network
    5. Database support
    6. Run docker container
    7. Run Jupyter
    8. Guides and References
  3. Develop
    1. Prerequisite software
    2. Clone repository
    3. Develop notebooks on host system
    4. Build docker image for development
  4. Examples
  5. Errors
  6. References

Legend

  1. 🤔 - A "thinker" icon means that a little extra thinking may be required. Perhaps you'll need to make some choices. Perhaps it's an optional step.
  2. ✏️ - A "pencil" icon means that the instructions may need modification before performing.
  3. ⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Expectations

Space

This repository and demonstration require 9 GB free disk space.

Time

Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.

Background knowledge

This repository assumes a working knowledge of:

  1. Jupyter
  2. Docker

Demonstrate using Docker

Initialize Senzing

  1. If Senzing has not been initialized, visit "How to initialize Senzing with Docker".

Configuration

Configuration values specified by environment variable or command line parameter.

Non-Senzing configuration can be seen at Jupyter Docker Stacks

Volumes

🤔 "How to initialize Senzing with Docker" places files in different directories. The following examples show how to identify each output directory.

  1. Example #1: To mimic an actual RPM installation, identify directories for RPM output in this manner:

    export SENZING_DATA_VERSION_DIR=/opt/senzing/data/1.0.0
    export SENZING_ETC_DIR=/etc/opt/senzing
    export SENZING_G2_DIR=/opt/senzing/g2
    export SENZING_VAR_DIR=/var/opt/senzing
  2. ✏️ Example #2: If Senzing directories were put in alternative directories, set environment variables to reflect where the directories were placed. Example:

    export SENZING_VOLUME=/opt/my-senzing
    
    export SENZING_DATA_VERSION_DIR=${SENZING_VOLUME}/data/1.0.0
    export SENZING_ETC_DIR=${SENZING_VOLUME}/etc
    export SENZING_G2_DIR=${SENZING_VOLUME}/g2
    export SENZING_VAR_DIR=${SENZING_VOLUME}/var
  3. 🤔 If internal database is used, permissions may need to be changed in /var/opt/senzing. Example:

    sudo chown $(id -u):$(id -g) -R ${SENZING_VAR_DIR}

Docker network

🤔 Optional: Use if docker container is part of a docker network.

  1. List docker networks. Example:

    sudo docker network ls
  2. ✏️ Specify docker network. Choose value from NAME column of docker network ls. Example:

    export SENZING_NETWORK=*nameofthe_network*
  3. Construct parameter for docker run. Example:

    export SENZING_NETWORK_PARAMETER="--net ${SENZING_NETWORK}"

Database support

🤔 Optional: Some database need additional support. For other databases, these steps may be skipped.

  1. Db2: See Support Db2 instructions to set SENZING_OPT_IBM_DIR_PARAMETER.
  2. MS SQL: See Support MS SQL instructions to set SENZING_OPT_MICROSOFT_DIR_PARAMETER.

Run docker container

  1. ✏️ Set environment variables. Example:

    export JUPYTER_NOTEBOOKS_SHARED_DIR=$(pwd)
    export WEBAPP_PORT=8888
  2. 🤔 Optional: Run Jupyter without token authentication. Example:

    export JUPYTER_PARAMETERS="start.sh jupyter notebook --NotebookApp.token=''"
  3. Run docker container. Example:

    sudo docker run \
      --interactive \
      --name test-senzing-jupyter \
      --publish ${WEBAPP_PORT}:8888 \
      --rm \
      --tty \
      --volume ${JUPYTER_NOTEBOOKS_SHARED_DIR}:/notebooks/shared \
      --volume ${SENZING_DATA_VERSION_DIR}:/opt/senzing/data \
      --volume ${SENZING_ETC_DIR}:/etc/opt/senzing \
      --volume ${SENZING_G2_DIR}:/opt/senzing/g2 \
      --volume ${SENZING_VAR_DIR}:/var/opt/senzing \
      ${SENZING_NETWORK_PARAMETER} \
      ${SENZING_OPT_IBM_DIR_PARAMETER} \
      ${SENZING_OPT_MICROSOFT_DIR_PARAMETER} \
      senzing/jupyter ${JUPYTER_PARAMETERS}

Run Jupyter

  1. If no token authentication, access your jupyter notebooks at: http://127.0.0.1:8888/

  2. If token authentication, locate the URL in the Docker log. Example:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://(a152e5586fdc or 127.0.0.1):8888/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    Adjust the URL. Example:

    http://127.0.0.1:8888/?token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    Paste the URL into a web browser.

Guides and References

The Jupyter notebooks in notebooks/senzing-examples are of two types:

  1. References - Information on specific method invocations and their parameters. Examples:
    1. G2Config reference
    2. G2Engine reference
  2. Guides - Illustrations of how to use methods to accomplish tasks. Often points to appropriate "Reference" entries for specific method invocations. Examples:
    1. G2Config add data source
    2. G2Engine add record

Develop

Prerequisite software

The following software programs need to be installed:

  1. git
  2. make
  3. docker
  4. jupyter notebooks

Clone repository

For more information on environment variables, see Environment Variables.

  1. Set these environment variable values:

    export GIT_ACCOUNT=senzing
    export GIT_REPOSITORY=docker-jupyter
    export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git
    export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"
  2. Follow steps in clone-repository to install the Git repository.

Develop notebooks on host system

  1. Set environment variables for senzing directories. See Volumes. Example:

    export SENZING_VOLUME=/opt/my-senzing
    
    export SENZING_DATA_DIR=${SENZING_VOLUME}/data
    export SENZING_DATA_VERSION_DIR=${SENZING_DATA_DIR}/1.0.0
    export SENZING_ETC_DIR=${SENZING_VOLUME}/etc
    export SENZING_G2_DIR=${SENZING_VOLUME}/g2
    export SENZING_VAR_DIR=${SENZING_VOLUME}/var
  2. Set environment variables. Example:

    export PYTHONPATH=${SENZING_G2_DIR}/python
    export LD_LIBRARY_PATH=${SENZING_G2_DIR}/lib:${SENZING_G2_DIR}/lib/debian
    export SENZING_SQL_CONNECTION="sqlite3://na:na@${SENZING_VAR_DIR}/sqlite/G2C.db"
  3. Start juypter notebook. Example:

    cd ${GIT_REPOSITORY_DIR}
    
    jupyter notebook

Build docker image for development

  1. Option #1: Using docker command and GitHub.

    sudo docker build --tag senzing/jupyter https://github.com/senzing/docker-jupyter.git
  2. Option #2: Using docker command and local repository.

    cd ${GIT_REPOSITORY_DIR}
    sudo docker build --tag senzing/jupyter .
  3. Option #3: Using make command.

    cd ${GIT_REPOSITORY_DIR}
    sudo make docker-build

    Note: sudo make docker-build-development-cache can be used to create cached docker layers.

Examples

Errors

  1. See docs/errors.md.

References

  1. A gallery of interesting Jupyter Notebooks
  2. Senzing notebooks