pangeo-data/pangeo-stacks

Why does conda have to solve an environment with onbuild images?

Closed this issue · 12 comments

With @cgentemann, I'm trying to update our tutorial to use the latest onbuild image. We are using pangeo/pangeo-notebook-onbuild:2020.02.16-e0f17a8.

I'm confused because binder is taking a LONG time to build the image. It has been solving an environment for ~30 minutes.

Waiting for build to start...
# Executing 5 build triggers
 ---> Running in 0bed3634fef6
Removing intermediate container 0bed3634fef6
 ---> Running in 0655f8a0a683
Reading package lists...
Building dependency tree...
Reading state information...
vim is already the newest version (2:8.0.1453-1ubuntu1.1).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.
Warning: you have pip-installed dependencies in your environment file, but youdo not list pip itself as one of your conda dependencies.  Conda may not use the correct pip to install your packages, and they may end up in the wrong place.  Please add an explicit pip dependency.  I'm adding one for you, but still nagging you.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working...

There is no environment.yml file in the binder:
https://github.com/cgentemann/osm2020tutorial/tree/master/binder

So I'm confused why an environment needs to be solved. The older onbuild images didn't work this way.

Update, after about 30 minutes, it finally decided it needed to install the following

Downloading and Extracting Packages
pyinterp-0.2.0       | 526 KB    | ########## | 100%
intake-esm-2019.12.1 | 19 KB     | ########## | 100%
pandoc-2.9.2         | 16.8 MB   | ########## | 100%

I'm pretty confused by this.

Those are new packages, so I'm guessing a conda update is run at some point. However, this gave me another idea. There is a hack-y implementation of a conda-lockfile that avoids the solver that could be useful here.

Those are new packages, so I'm guessing a conda update is run at some point.

But I don't understand why conda update is run at all here! Isn't the idea of these docker images to freeze the environment so we can easily create binders / hub environments without a lengthly build step?

But I don't understand why conda update is run at all here! Isn't the idea of these docker images to freeze the environment so we can easily create binders / hub environments without a lengthly build step?

I don't know anything about the workflow here. All I know is that those packages were updated recently, meaning an update command (or a new env on top on existing cache) was issued.

@rabernat - I don't have time to dig in today but the new onbuild images do have a modified Dockerfile and r2d_overlay script for enabling 'start' scripts. https://github.com/pangeo-data/pangeo-stacks/commits/master/onbuild/r2d_overlay.py - see this summary of changes from @yuvipanda scollis#1. I suspect things haven't been tested too much. The other consideration could be changes within repo2docker and how r2d_overlay plays with those changes pangeo-data/pangeo-binder#93.

@rabernat, the onbuild images have an environment.yml file, and when you build from an onbuild image, r2d_overlay.py will build the environment using the files that are already there:

(base) tjc@abby:~/github/ooicloud/pangeo-stacks/tmp$ docker run -i -t 6bca /bin/bash
root@be70f6fa442a:~# ls binder/
apt.txt  dask_config.yaml  Dockerfile  environment.yml  postBuild  tests  verify
root@be70f6fa442a:~#

When I tried building an image from this onbuild image, conda would not even solve, so you did well better:

(base) tjc@abby:~/github/ooicloud/pangeo-stacks/tmp$ cat Dockerfile 
FROM pangeo/pangeo-notebook:2020.02.16-e0f17a8
(base) tjc@abby:~/github/ooicloud/pangeo-stacks/tmp$ docker build .
Sending build context to Docker daemon  2.048kB
Step 1/1 : FROM pangeo/pangeo-notebook-onbuild:2020.02.16-e0f17a8
# Executing 5 build triggers
 ---> Running in 0d737fc81222
Removing intermediate container 0d737fc81222
 ---> Running in 18152315f1c2
Reading package lists...
Building dependency tree...
Reading state information...
vim is already the newest version (2:8.0.1453-1ubuntu1.1).
0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.
Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies.  Conda may not use the correct pip to install your packages, and they may end up in the wrong place.  Please add an explicit pip dependency.  I'm adding one for you, but still nagging you.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... Traceback (most recent call last):
  File "/usr/local/bin/r2d_overlay.py", line 147, in <module>
    main()
  File "/usr/local/bin/r2d_overlay.py", line 141, in main
    build()
  File "/usr/local/bin/r2d_overlay.py", line 109, in build
    ['/bin/bash', '-c', command], preexec_fn=applicator._pre_exec
  File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/bin/bash', '-c', 'conda env update -p /srv/conda/envs/notebook -f /home/jovyan/.onbuild-child/binder/environment.yml']' died with <Signals.SIGKILL: 9>.
The command '/bin/sh -c /usr/local/bin/r2d_overlay.py build' returned a non-zero code: 1

Have you tried using the non-onbuild image? It's not clear to me that you need the onbuild image, but maybe I'm not fully understanding what you are trying to do.

I think it is worth pointing out that the issue you are having is not your fault, but I think it's a real problem that we need to solve. My feeling is that onbuild images with r2d_overlay.py embedded in them is too complex and too confusing. I have not seen much reasoning behind this complexity except, essentially, Dockerfiles are hard. They aren't.

It's not clear to me why Pangeo project images cannot all be fully pre-built, starting from base-notebook, with a pretty simple Dockerfile. And maybe an environment.yml file if you really want that to be a separate file, but it doesn't need to be. If there is something I am missing, please let me know, because @scottyhq and I are going to make an effort to refactor into something simpler so it would be good to know where any pitfalls are.

I think if you install new packages on top of any existing conda installation, it solves packages. The overlay will be installing new packages... However, my understanding of how conda does stuff is not complete, so I could be wrong.

I don't know why it takes 30 minutes though!

@scottyhq I don't think the start script should cause any related changes here, though.

@tjcrone your error looks like conda update ran out of memory and was killed. This makes me think this might be something like a pathological case for conda? idk.

@yuvipanda, they are not trying to add any new packages: https://github.com/cgentemann/osm2020tutorial/tree/master/binder. They are starting with onbuild, which goes through the process of building the environment.yml file that is embedded.

@tjcrone oooh, very interesting! I'll take a look shortly, in that case.

It looks like e705b78 removed this line:

ONBUILD RUN rm -rf ${REPO_DIR}/.onbuild-child

from onbuild/Dockerfile. That means r2d_overlay.py thinks there is a environment.yml in the repo it is building, and this environment.yml is the environment.yml from the base image (in this repo). Trying to re-solve that is probably conda's worst case performance.

@scottyhq do you remember why that line was removed?

@yuvipanda it was from merging these changes from scollis@2fd47b3#diff-5d0850764a4cb0d890e2a2a203f14dd8 Lets get it back in if that fixes the issue temporarily!