Create a repository for Pangeo curated Docker images
guillaumeeb opened this issue ยท 10 comments
Idea proposed by @jhamman in pangeo-data/helm-chart#34 (comment).
This is part of the cleaning/reviving of Pangeo Helm-chart and related dependencies.
Some extracts from the issue above:
We may want to consider removing the docker files from the helm chart repo all together. One idea that somewhat interests me is creating one or more curated images that can be used for pangeo.
I'm thinking of a repository similar to https://github.com/dask/dask-docker with a directory for each image. The images could be defined by dockerfile (as they are here) or a binder spec (as they are in the hubploy cases). In either case, we could add a simple CICD script to build the images using repo2docker on circleci (or similar) and push them to dockerhub.
So Project Jupyter already does this: https://github.com/jupyter/docker-stacks
Maybe we try to emulate what they are doing (or just team up with them for a few base images).
First question is: do we need a Pangeo repo, or is it possible to put our images into the jupyter repo? I'm not sure how many images we want to propose, I'm currently seeing only one basic image taken from the helm-chart repo.
ping @jhamman, @jacobtomlinson, @yuvipanda
@yuvipanda - would docker-stacks be up for a pangeo image?
would docker-stacks be up for a pangeo image?
This seems somewhat unlikely based on the discussion in issue ( jupyter/docker-stacks#517 ). That said, I'm sure common improvements to the existing images there are welcome ( maybe some of these pull in the same direction jupyter/docker-stacks#748 ). Plus linking from docs over there to wherever these images live is likely welcome.
Thanks @jakirkham.
Yes, I think we should put together our own version of docker-stacks, preferably using repo2docker
. If anyone wants to get started on this, let me know and I can help grease the skids.
Should have added that there are some recommendations for community Docker images in these docs. Basically there is a cookiecutter repo that has been setup for this purpose, which should make it easier to get something up and running quickly. Of course there are other tools along these lines that may work nicely too.
I really like to dive into all this, this is tightly linked to the helm-chart and I'd like to learn more. I'm slightly overwhelmed by day to day work right now, but I will try to make room for Pangeo as soon as I can.
In the meantime, if anyone can work on this, I'll be glad and follow closely.
Jupyter docs highlighted by @jakirkham are based on plain Docker image, whereas @jhamman is talking about repo2docker, which seems like a nice option!
So shall we do it the official Jupyter way, or using maybe more recent tooling?
@yuvipanda did not chime in, and maybe @minrk has something to say too!
I'd say use repo2docker, but I'm biased :) repo2docker didn't exist when jupyter-stacks were created, so that discussion was never had. If repo2docker doesn't do what you wish, you can always just use a Dockerfile - which repo2docker also supports.
This is what we're gonna be using for hubploy, which is how the Pangeo jupyterhubs are going to be deployed. That gives it another +1 for using it here.
@yuvipanda - thanks for chiming in. We talked about this in our meeting today and @mrocklin had some similar thoughts. I think the right structure is something like this.
- 1 top level directory for each image, with
binder
style configs - use repo2docker to build images
- use repo2docker's appendix to inject a few "Pangeo requirements", these requirements would be fairly minimal but would ensure images could run on a Pangeo system
- some limited CI/CD for pushing images to dockerhub
How does this all sound?
Hey, that was easier than I expected! I've created this repo (https://github.com/pangeo-data/pangeo-stacks) and everything we've talked about here is either already working or listed as a todo issue in the repo. I'm going to close this and encourage others interested to weigh in over there.