/Docker_SPARC

A Docker image with JupyterLab, Python 3.6, Python 2.7, and R

Primary LanguageJupyter Notebook

What is Docker?

Docker is a software system that uses containers. Containers are isolated user-defined instances that operate as separate computers and have various levels of restricted access to the host computer's resources (e.g., shared folders). More than one container can run simultaneously on a host, communicating with the host and other containers.

A Docker container is a running instance of the Docker image.

Advantages of Docker:

There are two significant advantages when using Docker images for science projects:

1. Easy installation.

Docker GREATLY reduces the time (and expertise) needed to install, run, and maintain software, especially analytic software using Python, R, and other open source resources, such as Jupyter notebooks; Jupyter runs in an internet browser to provide access to multiple programming kernels and is a good way to store notes and analyses for sharing and reproduction. A stockpile of pre-built Docker images are available for download from Docker Hub; Docker Hub is a cloud service similar to Github but designed to maintain Docker images.

2. Reproducible/shareable computing environments.

By using and sharing a specific Docker image for analysis, subsequent users can run the original versions of the software and recreate the analysis or extend the work to new problems.

How to get started using Docker:

Docker can run on Windows, macOS, and Linux. Installation of Docker on Windows and macOS comes with a lightweight Linux virtual environment. Follow these installation steps:

1. Install Docker ...

for Windows, macOS, or populars flavor of Linux, Ubuntu, Debian, CentOS, and Fedora. You need the free community edition of Docker.

2. Get a free Docker ID ...

Go to Docker cloud to sign-up. This will allow sign-in to the cloud service to download and search for Docker images.

Basic Docker commands:

You can walk through the official Docker tutorial, which explains basic commands and provides test images. There is also a command cheatsheet.

Here are frequently used commands used at the terminal to see which docker images you have installed, those currently running, and how to run a new one:

To list installed docker images on your local machine ...

docker images

To list running and stopped containers ...

docker ps -a

To run a container from an image (you can add flags to this command, such as shared folders and ports between the container and host computer; more about this later) ...

docker run [image name]

What's in the SPARC supplied Docker image?

The "sparc:jupyter_V1.5" image runs Ubuntu 18.04 Linux distribution with the following analytic software:
NOTE: To maintain compatibility, some software is kept at slightly less than the current version; also, the list of Python 2 and R packages is short to reduce the size of the docker image, additional packages can be installed and a new Docker image can be saved to meet the needs of each user.

  • Python 3.6.6
    Python is a high-level interpreted general-purpose programming language. Entering the command "python" in the container's terminal will start Python 3 on the command line.

  • Python 2.7.15
    Entering "python2" at the command line will start Python 2, which provides usage for a multitude of older Python 2 software.

  • R 3.4.4
    R is a software environment for statistical computing and graphics. Entering "R" on the command line starts R from the terminal.

  • Jupyter notebook 4.4.0
    Jupyter notebook is an open-source web browser application platform (this can run on your local machine or by logging into a server) to create notebooks containing text notes, programming code, and graphics, which are shareable. Typical usage for these notebooks is to run Python code but many other languages can also be used, including R.

  • Jupyter lab 0.35.2
    Jupyter lab is a new interface for Jupyter notebooks (it does still support using notebooks in the traditional way; i.e., one notebook per browser tab). The Jupyter lab interface greatly increases functionality by providing a file browser, many cutting-edge extensions, and the ability to a make panels in a single browser tab (e.g., a terminal, notebook, images all side-by-side).

  • JupyterLab extensions
    Several Jupyter lab notebook extensions are included, with others available through installation (https://github.com/topics/jupyterlab-extension). The installed extensions include:
    ---new interface tabs---
    table of contents (table of contents for Jupyter notebooks)
    cell tags (descriptive tags can be added to notebook cells)
    google drive (google drive collaboration)
    github (file browser for github repos)
    git (version control)
    ---plotting and graphics---
    plotly (plotting)
    geojson (geo mapping extension)
    drawio (a vector drawing program)
    bokeh (plotting)
    ---file format rendering---
    html (renders html documents)

  • Python packages:
    ---numerical manipulations and dataframes---
    numpy (numerical arrays), Python 2 & 3
    pandas (dataframes and numerical analysis), Python 2 & 3
    scipy (efficient numerical routines), Python 2 & 3
    ---plotting and visualizations---
    matplotlib Python 2 & 3
    ggplot (a Python port of the popular R ggplot2 package), Python 2 & 3
    seaborn (statistical data visualization built on top of matplotlib), Python 2 & 3
    plotly Python 2 & 3
    bokeh Python 2 & 3
    altair Python 3
    ---statistics and machine learning---
    statsmodels (statistics), Python 2 & 3
    scikit-learn (machine learning), Python 3
    ---image analysis---
    scikit-image Python 3
    ---electrophysiology---
    neo (electrophysiology data conversion), Python 2 & 3
    pybursts (algorithm to detect activity bursts in time series data; Python implementation of the R "bursts" package), Python 2
    ---misc---
    rpy2 (interface between Python and R), Python 2 & 3
    h5py (binary data storage in HDF5 format), Python 2 & 3
    notedown (tool for converting R markdown files to Jupyter notebooks), Python 3
    jupytext (edit jupyter notebooks as plain text python files), Python 3
    blackfynn (API for interaction with the Blackfynn platform for data storage/analysis), Python 2 & 3

  • R packages:
    ---numerical manipulations and dataframes---
    dplyr (dataframe manipulation)
    plyr (dataframe manipulation)
    reshape2 (dataframe manipulation)
    tidyr (data tidying)
    ---plotting and visualizations---
    ggplot2 (plotting; based on the book "The Grammar of Graphics" by Leland Wilkinson, 2005)
    ---statistics---
    pwr (power analysis)
    psych (among many things this package has convenient data summary statistics functions)
    ez (analysis and visualization of factorial experiments)
    ---electrophysiology---
    STAR (Spike Train Analysis with R)
    bursts (algorithm to detect activity bursts in time series data)

  • Jupyter "how-to" notebooks included in the Docker image:
    "Index" notebook; contains links to the following notebooks stored in the "Example_Jupyter_Notebooks" directory:
    1.check software versions; checks the software versions of Python 3, Python 2, and R and their packages
    2.python and R in same notebook; uses the Python package "rpy2" to create a bridge between Python and R in the same notebook
    3.pass data; uses the built-in notebook command "%store" to move data between notebooks
    4.python load spike2 uses the Python "Neo" package; other file type imports are supported, e.g., Matlab, Axon Instruments, HDF5, Neuroshare, Plexon, Tucker Davis, etc., also, shows how to import an external script into the notebook (a good way to reduce code clutter)
    5.image analysis; uses scikit-image Python package to separate colors in an immunohistological image (http://scikit-image.org).
    6.blackfynn api; brief example of uploading and and checking files on the Blackfynn platform; see Blackfynn's documentation for more information; NOTE: you'll need to set your credentials (i.e., Blackfynn profile) in the Docker container to use the API

  • Customization. You can install additional Python, R, and operating system packages in the container using standard terminal commands, e.g., "pip3" commands for python 3, "pip" commands for python 2, and "install.packages()" in R command line; use "apt install" in the Linux terminal for Ubuntu packages. You will need to "commit" these software changes to a new image to save (see below).

The Dockerfile used to create the image is included in this repository, but you DO NOT need to create the image. This is time consuming and depending on the speed of your computer could easily exceed 15 min. Instead, you can download the pre-built image from Docker Hub (see below) and begin working on data analysis immediately!

How to use the Docker image

First, download the image by entering this command in the terminal on your host machine (do this after installing the Docker application on your computer):

docker pull cchorn/sparc:jupyter_V1.5

Next, enter:

docker run --rm -it -p 8888:8888 -v ~/Desktop:/home/work cchorn/sparc:jupyter_V1.5

The "docker run" command starts a container based on the image "cchorn/sparc:jupyter_V1.5". The command also contains three flags:
1 - "-it", interactive terminal, which will keep the container running in the terminal until you close it.
2 - "-p", port mapping from the host port on the left and container port on the right of the ":" in the command. This means that when the Jupyter server runs on port 8888 in the container it will map to port 8888 on the host, i.e., you can go to this port in the host's web browser URL address and see the Jupyter notebook.
3 - "-v", volume (folders) mapping from host to container, in this case the container will be able to see the host's "Desktop" folder from the container's "work" folder; the host's folder name should be customized for your computer: please change "Desktop" to match a folder on your computer.
4 - "--rm" flag will cause automatic deletion of stopped containers (stop the running container by executing keystrokes "ctrl-c" in the terminal). If you intend to make software changes and save a new image you need to remove this flag before running the container.

After entering the "docker run" command a URL will be generated (http://localhost:8888/ + a security token). Copy the URL to your browser to see the Jupyter notebook directory (also copy the security token and enter it if requested). On some host machines (e.g., Windows 7 and 8), you will need to determine the ip address of the running container and use this instead of "localhost" (e.g., docker-machine ip).

Success! You should now see the Jupyter lab interface. From here, you can create notebooks using Python and R and access the container's command line terminal. New notebooks and files you create will be saved to your host machine folder mapped with the "docker run" command. Any documents that you want to save must be stored in the container's "work" directory, which is mapped to host's volume you specified using the "-v" flag.

The running container can be stopped by entering keystrokes "ctrl-c" in the host's terminal. If you ran Docker without the "--rm" flag, you can now see the exited container in the container list by entering:

docker ps -a

A container can be removed using the command:

docker rm [container name]

The "container name" is a name generated by Docker, found at the end of each entry in the container list.

Alternatively, if you made software changes, you can save the container as a new image using the "commit" command, for example:

docker commit [container name] [repository name:tag]

This new image will be loaded into the Docker image list on your local machine. To see the list use the "docker images" command. You can also backup the image to your Docker cloud account using the "push" command. And, this image can then be shared with the scientific community.

Other resources:

If you wish to build the docker image from the Dockerfile run the following command from the Dockerfile folder (Note: downloading the pre-built image from Docker cloud is easier):

docker build --tag cchorn/sparc:jupyter_V1.5 .

*Acknowledgements:

Derek Miller, University of Pittsburgh; testing Blackfynn API code
Stephanie Fulton, University of Pittsburgh; testing Docker functionality
Michael Sciullo, University of Pittsburgh; testing Docker functionality

This work was supported by awards from the National Institutes of Health (NIH) - Stimulating Peripheral Activity to Relieve Conditions (SPARC) Program, including these projects:*

1. Defining gastric vagal mechanisms underlying emetic activation using novel electrophysiological and optical mapping technology. 3U18EB021772-02S2.
2. Closed-loop neuroelectric control of emesis and gastric motility 1U18TR002205-01

MIT license
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.