pangeo-data/pangeo-stacks

consistent repo2docker versions?

scottyhq opened this issue · 4 comments

Should we try to keep the repo2docker version the same across our various repos that use it for building pangeo images? Seems like doing will help guarantee a given repo builds to the same environment and is runnable on different hubs.

And also in sync with mybinder.org? (build_image: jupyter/repo2docker:bf29f66f)
which updates regularly - jupyterhub/mybinder.org-deploy#1091
https://github.com/jupyterhub/mybinder.org-deploy/blob/43ae77db9d3d7961964a6e1d383f3749a6d6dbe6/mybinder/values.yaml#L68

cc @jhamman

@scottyhq - I think we should keep with the release early/often paradigm here. Ideally, everything is in sync and up to date. To do that, we probably need some sort of automated system. As you mentioned in your post, mybinder.org now has a bot updating their helm chart and repo2docker versions. Pangeo has a more complex dependency system but we issues like this one highlight how automation could be useful at this point.

tldr; +1 on bumping all versions of r2d to match with mybinder.org (I did the google binderhub this morning).

Reopening b/c I think there are few more things to work out. The build scripts between pangeo-stacks (https://github.com/pangeo-data/pangeo-stacks/blob/master/build.py) and hubploy (https://github.com/yuvipanda/hubploy/blob/master/hubploy/imagebuilder.py used in pangeo-cloud-federation) are slightly different.

An image built with hubploy currently drops a user into a jupyter session in the srv/repo directory https://github.com/yuvipanda/hubploy/blob/master/hubploy/imagebuilder.py#L42

The same image definition built in pangeo-stacks puts a user in /home/jovyan (which is what we want for jupyterhubs and binder)

@scottyhq - can you explain a bit more why there is a problem with the differing launch behavior?

Long story short, it would be great to ensure images we are building work equivalently on both BinderHub and JupyterHub. For BinderHub, it doesn't really matter which folder you end up in b/c the files you want to work with are typically the repo contents.

By default repo2docker runs in /home/jovyan. But for JupyterHub, we overwrite /home/jovyan with the NFS mount so any repo2docker build files (including repo contents) are overwritten if we run the same image on JupyterHub.

Repo2docker does allow you to run in a different REPO_DIR, but then that is set to the default working directory if running on JupyterHub. Which is confusing b/c I think users should start out in /home/jovyan and it also makes browsing for files difficult jupyterlab/jupyterlab#2532.

So i think we might always want to just run in /home/jovyan

  • but make sure in binder repos we have data and notebooks in a different branch or repo and pull that content after starting w/ nbgitpuller
  • double check that repo2docker doesn't create any necessary config files in /home/jovyan that are being overwritten (as far as I know it doesn't by design but I'm thinking of .conda, .bashrc, etc)