pangeo-data/pangeo-stacks

How to use pangeo-stacks images with dask-labextention layout in binder repos?

Opened this issue · 12 comments

So I'm trying to work on pangeo-data/pangeo-tutorial#14.

I decided to use pangeo/pangeo-notebook-onbuild:2019.04.19 Docker image as found in several recent Pangeo deployments. This seems to work, however, I've lost the dask-labextension layout, and I'm not sure what I should do.

Looking at https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/nasa/image/binder or https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/ocean/image/binder, there seems to be some post config files, but no dask-labextension layout.

So what is the correct configuration to use to have a basic Pangeo notebook image with a nice dask-labextension layout?

/cc @ian-r-rose who might know

@guillaumeeb here is a demo I wrote to show how to set up a new layout that works on binder. It takes a bit of work, but is doable. The layout is stored in the jupyterlab-workspace file, which you distribute with the binder.

I don't think our current onbuild setup support the start file syntax. Something that is currently baked into how we use the jupyterlab-workspace features.

Q for @yuvipanda - were there challenges getting the start file entrypoint to work or is this a feature we could implement?
Q for @ian-r-rose - have you heard talk of repo2docker supporting the workspace spec as a known configuration file? This may be an interesting proposal that would eliminate the need for the start file in this use case.

@jhamman I have not heard any talk of that, but it's a neat idea. There is currently no formal spec for workspace files (though it would be nice to have one), so it would be up to the user to provide a well-formed one for their particular binder setup. But it would certainly help in cutting down on the boilerplate start script flimflam (which, as we have seen, is pretty error-prone)

@jhamman we can totally support 'start' in onbuild. I didn't implement it mostly to get an MVP out fast. The way to do that would be:

  1. Implement our own Entrypoint that is called all the time
  2. If we have a custom start file, it'll call that. If not, it'll just fall back to the default command being called.

Basically, we need to re-implement https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/repo2docker-entrypoint in r2d_overlay.py.

IMO, the bug is possibly in workspaces needing base_url. See jupyterlab/jupyterlab#5977 for more details. Changing that would fix the start related issues, and also make this much more robust in a lot of use cases. Based on jupyterlab/jupyterlab#5977 (comment) it's unclear why it is needed :)

Hey.. I am having a hack at this
See https://github.com/scollis/pangeo-stacks/blob/addstart/onbuild/r2d_overlay.py#L112

One thing I don't understand (I am a docker noob) is where to put it in here

ONBUILD RUN /usr/local/bin/r2d_overlay.py build

is it ENTRYPOINT RUN /usr/local/bin/r2d_overlay.py start

@scollis something like that! One addition to your start script would be to make sure it works when there's no 'start' script present. In that case, it should default to calling /usr/local/bin/repo2docker-entrypoint (https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/base.py#L182) which will default to what repo2docker does.

You should probably also just pass the path directly instead of passing it as an arg to /bin/bash.

Thank you for working on this!

@yuvipanda if a start script is present should it run it and then run /usr/local/bin/repo2docker-entrypoint

@scollis I think it should only run repo2docker-entrypoint if a start script is not present...

Awesome.. I am at ORNL and just about to leave.. Pushing a docker image to dockerhub now.. once I am back at the hotel I dont think the wifi can handle a 30GB upload :D

@yuvipanda
"You should probably also just pass the path directly instead of passing it as an arg to /bin/bash."

I am copying what is done in postbuild..

so you are saying I should do

    #Enable additional actions in the future
    applicators = [apply_start]

    for applicator in applicators:
        commands = applicator()

        if commands:
            for command in commands:
                subprocess.check_call(
                    [ command], preexec_fn=applicator._pre_exec
                )

@become(NB_UID)
def apply_start():
    st_path = binder_path('start')

    if os.path.exists(st_path):
        return [
            f'chmod +x {st_path}',
            # since pb_path is a fully qualified path, no need to add a ./
            f'{st_path}'
        ]