/containers

Files to build various containers for LS4GAN

Primary LanguageDockerfile

This package collects recipes to build container images that may be used for LS4GAN related development and processing.

General information

Images are defined via a Dockerfile and some additional files in a sub directory. From a Docker image a Singularity image may be derived and images are pushed to Docker Hub and/or SDCC’s registry.

In the remainder of this section, basic and generic guidance is documented. In the next section, descriptions of available images are listed. Finally, some guidance in dealing with SDCC’s registry is provided.

Build

A container image may be built locally with:

$ mkdir ls4gan
$ git clone https://github.com/LS4GAN/containers.git ls4gan/containers
$ cd ls4gan/containers/docker/<name>
$ docker build -t ls4gan/<name>:<version> .

Note, some images build on top of others that are also built from here. The naming convention illustrated above should be kept in mind when looking at a FROM line in a Dockerfile.

Run a container

Each container has its own default command (CMD) of bash which is run as the argument to the entry point (ENTRYPOINT) of bash -c. Thus to get an interactive shell:

$ docker run -ti ls4gan/wirecell
#

Or to run a command provided by the image:

$ docker run -ti ls4gan/wirecell "wire-cell --help"
[...help message...]

Singularity

To derive a Singularity container image from a Docker image:

$ singularity build ls4gan-<name>-latest.sif  $IMAGE_URL 

The $IMAGE_URL can be in one of several forms

local
docker-daemon://ls4gan/<name>:latest
docker hub
docker://ls4gan/<name>:latest
SDCC registry

It recommended to follow the illustrated naming convention so that some provenance is kept when sharing the resulting image file.

To run a Singularity container

$ singularity exec ls4gan-<name>-latest.sif "wire-cell --help"

It is recommended to name the Singularity image file following the convention so that when sharing these files their origin is hinted.

To run the default shell or a program in the container

$ singularity run  /srv/tmp/ls4gan-wirecell-latest.sif "wire-cell --help"
[ ...help message...]
$ singularity run  /srv/tmp/ls4gan-wirecell-latest.sif
Singularity> which wire-cell
/usr/local/bin/wire-cell

Docker Hub

The ls4gan area on Docker Hub holds some of the images produced here. In the examples we will use the ls4gan/wirecell image. To get this image into your local docker, run:

$ docker pull ls4gan/wirecell:latest

Or, if you are building images and a docker login they may be uploaded with, eg:

$ docker push ls4gan/wirecell:latest

Container Images

wirecell

This image provides the Wire-Cell Toolkit C++ and Python and their externals. It is built on a minimal Debian with WCT and additional software installed under /usr/local/

This environment can be used to run any “stand-alone” wire-cell job or any of the wirecell-* Python CLIs. It also provides lots of Python goodies including Numpy, Matplotlib, ipython, JupyterLab (needing special docker run to see its ports).

It also provides snakemake so can be used to exercise the toyzero data generator.

Using the derived Singularity image to enjoy easy access to native home directory files:

$ git clone https://github.com/LS4GAN/toyzero.git
$ cd toyzero
$ singularity run /srv/tmp/ls4gan-wirecell-latest.sif "snakemake -jall -p just_images"
$ tree data
[...generated data files...]

The approximate equivalent directly with Docker in a minimal environment is like:

$ cd .. # parent holding local toyzero/
$ mkdir run
$ cd run/
$ cp -a ../toyzero/{Snakefile,cfg,toyzero.yaml} .
$ docker run \
    --user user \
    --volume (pwd):/data \
    -ti ls4gan/wirecell:0.16.0 \
    "cd /data && snakemake just_images -j1 -p --config seed=1234 outdir=test10 ntracks=100 nevents=10 wcloglvl=debug threads=8"
$ tree test1 

Versions

The ls4gan/wirecell container image is versioned to reflect the version of Wire-Cell Toolkit. The version of Wire-Cell Python and other packages may be chosen to be newer. Defaults are provided or may be overridden, eg:

$ docker build -t ls4gan/wirecell:0.15.0 \
   [ --build-arg WCPYTHON_VERSION=X.Y.Z ... ] \
   .
$ docker push ls4gan/wirecell:0.15.0

runner

This container provides (will provide) support for running a toyzero pipeline as a single, ready-to-run job. It rides on top of the wirecell container.

notebook

This container provides (will provide) support for JupyterLab notebooks. It includes ls4gan-python, toytools and other Python

BNL/SDCC Registry

BNL/SDCC provides a container registry called “Portus” at the internal nost registry.sdcc.bnl.gov. It is used much like Docker Hub but one must give the hostname explicitly:

❯ docker pull registry.sdcc.bnl.gov/toyzero/wirecell

Remote access

Outside the BNL network it is possible to access the registry by forwarding a local port via SSH to an internal HTTPS proxy. Basically, follow docker guidance on setting HTTP/HTTPS proxy. For example

# cat <<EOF > /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://127.0.0.1:3128"
Environment="HTTPS_PROXY=http://127.0.0.1:3128"
Environment="NO_PROXY=localhost,127.0.0.1,.home,haiku"
EOF
# systemctl daemon-reload
# systemctl restart docker

It is also helpful to add the internal IP address to /etc/hosts.

It should now be possible to login

❯ docker login registry.sdcc.bnl.gov

And the two steps to register and upload an image:

❯ docker tag ls4gan/wirecell registry.sdcc.bnl.gov/toyzero/wirecell
❯ docker push registry.sdcc.bnl.gov/toyzero/wirecell

Or, docker pull as above.

Singularity

Currently building a Singularity image from a Docker image in Portus does not work. Expect an error like:

❯ singularity pull --docker-login docker://registry.sdcc.bnl.gov/ls4gan/toyzero/wirecell
Enter Docker Username: bvlbne
Enter Docker Password: 
FATAL:   While making image from oci registry: error fetching image to cache: failed to get checksum for docker://registry.sdcc.bnl.gov/ls4gan/toyzero/wirecell: error pinging docker registry registry.sdcc.bnl.gov: Get "https://registry.sdcc.bnl.gov/v2/": dial tcp 130.199.148.226:443: i/o timeout

This may be due to offsite access restriction. In any case, it needs more checking.

Example for toyzero

Toy “one”

Toy “one” is the first “real” use of toyzero.

Docker

First production run on SDCC will involve jobs something like this with docker:

$ docker run --user user --volume (pwd):/data \
  -ti ls4gan/toyzero:0.3.0 \
  "cd toyzero && \
   snakemake just_tar -j1 -p \
   --config outdir=/data seed=1234 ntracks=100 nevents=1 wcloglvl=debug threads=8"
$ ls -lh toyzero-100-1-1234.tar 
-rw-r--r-- 1 bv bv 13M Aug 19 16:00 toyzero-100-1-1234.tar
$ du -sh seed-1234
4.8M	seed-1234
$ rm -rf seed-1234

Note:

  • We bind mount CWD to /data in the container and then use that as outdir
  • The seed MUST be chosen unique each run
  • The nevents chosen to match run time
  • The threads say how many threads are used for wire-cell. Likely set to 1 for batch processing. More can be used but the RAM usage will increase. At 8 threads about 4 GB RAM is used. There is also -j1 to limit number of jobs that snakemake will run concurrently. There are two wire-cell jobs in the graph and they provide the bottleneck. Setting this to 2 (with threads=1) would be reasonable.
  • A ./.snakemake/ directory will get made and used inside the container and thus not retained, but it is not needed.
  • The results to keep are in a tar file by default named toyzero-{ntracks}-{nevents}-{seed}.tar which is built from contents of directory {outdir}/seed-{seed}/. This directory can be discarded once the tar file is secured.

Singularity

Derive the Singularity image from the docker.

$ singularity build ls4gan-toyzero-030.sif \
    docker://ls4gan/toyzero:0.3.0
$ ls -lh ls4gan-toyzero-030.sif
-rwxr-xr-x 1 bv bv 580M Aug 19 16:58 ls4gan-toyzero-030.sif

Singularity requires a little awkward run pattern to first copy the toyzero config from the container’s user account to a host CWD owned by the native user.

$ mkdir run-singularity
$ cd run-singularity/
$ singularity run ../ls4gan-toyzero-030.sif \
  "cp -a /home/user/toyzero/* . && \
  snakemake just_tar -j1 -p --config  seed=1234 ntracks=100 nevents=1 wcloglvl=debug threads=8"
$ ls -lh toyzero-100-1-1234.tar
-rw-r--r-- 1 bv bv 13M Aug 19 17:15 toyzero-100-1-1234.tar
$ mv toyzero-100-1-1234.tar ..
$ cd ..
$ rm -rf run-singularity/

  • [ ] fix warning from singularity
2021/08/19 17:20:14  warn rootless{opt/oneapi-tbb-2021.1.1/lib/intel64/gcc4.8/libtbb.so} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
...
2021/08/19 17:20:20  warn rootless{home/user/toyzero/.git/objects/pack/pack-6cffdba70a871b245ca1201e3133eb5a1adf298a.idx} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"