jupyter/docker-stacks

Populate home dir if it is empty

benz0li opened this issue · 26 comments

What docker images this feature is applicable to?

jupyter/base-notebook

What changes do you propose?

Populate home dir if it is empty.

How does this change will affect users?

If a user mounts to /home/{raw_username} it will be populated from /home/jovyan on initial startup.
ℹ️ NB_USER set to {raw_username}; the unescaped username (e.g. from JupyterHub).

diff --git a/base-notebook/start.sh b/base-notebook/start.sh
index 05c1037..9e59ab2 100644
--- a/base-notebook/start.sh
+++ b/base-notebook/start.sh
@@ -123,6 +123,15 @@ if [ "$(id -u)" == 0 ] ; then
                     exit 1
                 fi
             fi
+        # The home directory could be bind mounted. Populate it if it is empty
+        elif [[ "$(ls -A "/home/${NB_USER}" 2> /dev/null)" == "" ]]; then
+            _log "Populating home dir /home/${NB_USER}..."
+            if cp -a /home/jovyan/. "/home/${NB_USER}/"; then
+                _log "Success!"
+            else
+                _log "Failed to copy data from /home/jovyan to /home/${NB_USER}!"
+                exit 1
+            fi
         fi
         # Ensure the current working directory is updated to the new path
         if [[ "${PWD}/" == "/home/jovyan/"* ]]; then

|| ln -s /home/jovyan "/home/${NB_USER}" is obsolete, because the home dir is copied [and not moved] now.

In my custom JupyterLab docker stack a duplicate of /home/jovyan is kept at /var/tmp/jovyan. If a user bind mounts to /home/jovyan itself and NB_USER is set to jovyan, the home dir is pre-populated by the following script:

/usr/local/bin/start-notebook.d/populate.sh:

#!/bin/bash

set -e

if [[ "$(ls -A "/home/jovyan" 2> /dev/null)" == "" ]]; then
    cp -a /var/tmp/jovyan/. /home/jovyan
fi

Which are the files here that you care about? Is your home directory totally empty, like missing .bashrc? Or is it the work folder? I'm trying to understand what test cases we should write.

Yes, the home directory is entirely empty if bind mounted to /home/{raw_username}. As it exists, ! -e "/home/${NB_USER}" is FALSE, thus nothing is copied from /home/jovyan/ to /home/${NB_USER}.
ℹ️ NB_USER set to {raw_username}

It's the whole content of /home/jovyan/ I care about.

For jupyters base-notebook this would take care about

-rw-rw-r-- 1 jovyan users  220 Feb 25  2020 .bash_logout
-rw-rw-r-- 1 jovyan users 3823 Nov  8 05:07 .bashrc
drwsrwsr-x 2 jovyan users 4096 Nov  8 05:09 .cache
drwsrwsr-x 1 jovyan users 4096 Nov  8 05:08 .conda
drwsrws--- 3 jovyan users 4096 Nov  8 05:09 .config
drwsrws--- 2 jovyan users 4096 Nov  8 05:09 .jupyter
-rw-rw-r-- 1 jovyan users  807 Feb 25  2020 .profile
-rw-rw-r-- 1 jovyan users  227 Nov  8 05:07 .wget-hsts
drwsrwsr-x 2 jovyan users 4096 Nov  8 05:07 work

For my custom jupyterlab image this would take care about

-rw-r--r--  1 jovyan users  220 Aug  4 20:25 .bash_logout
-rw-r--r--  1 jovyan users 3768 Nov  5 10:50 .bashrc
drwxr-xr-x  3 jovyan users 4096 Nov  5 10:50 .local
drwxr-xr-x 12 jovyan users 4096 Nov  5 10:50 .oh-my-zsh
-rw-r--r--  1 jovyan users  807 Aug  4 20:25 .profile
-rw-r--r--  1 jovyan users 4219 Nov  5 10:50 .zshrc

Only for the initial/first startup and if the mounted home directory at /home/${NB_USER} is empty.

@consideRatio the merge seems to have unintentionally caused this issue to be closed prematurely. (Probably I shouldn't have had "close #1478" in my description! 😆)

Or is it the work folder?

I find the suggestion of -v "${PWD}":/home/jovyan/work at https://github.com/jupyter/docker-stacks#quick-start > 'Example 2' rather odd, because this does not preserve dotfiles (user-specific application configuration) - e.g. ~/.local, ~/.config, etc.

That's one of many reasons why I'm building my own Jupyter docker stack incorporating the changes listed above and mounting the entire home directory.

Closing due to inactivity.

@benz0li I do like this proposal (sorry I haven't answered 2 years ago 🙂).

I think the best way for us would be to do the following:

  1. Backup /home/${NB_USER} as the last stage of Dockerfile.
  2. In the start.sh script restore these files to /home/{raw_username} (and we'll need to fix permissions in some cases).
    We'll only be restoring a file or directory if it doesn't exist already.

I see 2 current issues, which will be solved by such a behaviour:

  1. #1792
  2. #815

Or is it the work folder?

I find the suggestion of -v "${PWD}":/home/jovyan/work at https://github.com/jupyter/docker-stacks#quick-start > 'Example 2' rather odd, because this does not preserve dotfiles (user-specific application configuration) - e.g. ~/.local, ~/.config, etc.

That's one of many reasons why I'm building my own Jupyter docker stack incorporating the changes listed above and mounting the entire home directory.

I checked that the example works fine though, because:

  1. We mount work subdir
  2. We don't change the NB_USER.
    In such a case we have default .bashrc and other files from the image.
maresb commented

A fresh thought on this... Linux has a default template in /etc/skel for templating home directories of new users. I'm not sure whether or not it would make sense to use this in this case as the "backup" of /home/${NB_USER}.

@mathbunnyru Check the difference from https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/scripts/usr/local/bin/start.sh to the current file of this repository.
ℹ️ There will be some additional stuff because I use Zsh as default shell and have code-server installed.

I populate with https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/scripts/usr/local/bin/start-notebook.d/populate.sh using a start-notebook.d hook.

Thanks @maresb, that's a good point.

We have some files, which are more our image specific, like .jupyter subdir, which might create environments, added by users.
Is it a good idea to put such files/dirs to /etc/skel as well?

I don't know if we can actually use /etc/skel unfortunately.
If a user has a custom NB_USER and mounts the homedir, what is the behaviour of /etc/skel?
Is it gonna ignore existing files/dirs, break or overwrite?

Thanks @benz0li!

Could you please tell, how do you backup /home/jovyan files?
Manually in each image (adding newly-created or changed files) or automatically just by archiving the whole directory (using some backup.sh-like script)?

It's nice to see someone using our start-notebook.d startup hook, because it is not tested at all 😆

maresb commented

Using /etc/skel may be a bad idea. It will depend on the use case. Also, changing /etc/skel could interfere with the behavior of existing images. I just wanted to point out its existence, but unfortunately I don't have the headspace at the moment to evaluate the merits in this instance.

Could you please tell, how do you backup /home/jovyan files?

https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/latest.Dockerfile#L270-L271

Automatically in each image (adding newly-created or changed files) or just by archiving the whole directory?

I originally wanted this done only in the so called base images of my JupyterLab docker stacks.

There is one exception, though: https://github.com/b-data/jupyterlab-r-docker-stack/blob/3e81912c6763d0f198901ac0c19a5ce027cfa03f/qgisprocess/latest.Dockerfile#L249-L250

I actually changed my mind - I think it's better to manually backup files we want to preserve (like .bashrc), rather than backing up whole dir each time:

  1. It's gonna be more explicit.
  2. It's easy to see in which image file was created.
  3. We have more control, and we probably don't want copy something like .lesshst or wget history file and so on.
  4. I think we'll have most of the copying in the docker-stacks-foundation image and other images won't change at all.
  5. Downside - if someone changes file/dir location, then the image will break - which is obviously rare and not such a big deal.

Thanks @benz0li!
I see you're copying the whole HOME in base, and one specific dir in some inherited image.

If a user has a custom NB_USER and mounts the homedir, what is the behaviour of /etc/skel?
Is it gonna ignore existing files/dirs, break or overwrite?

If I remember correctly, the bind mounted home directory is not populated [with /etc/skel].

I am quite sure, that was the reason I copied to /var/backups/skel.

`> If I remember correctly, the bind mounted home directory is not populated.

Yes, you're right, but we're calling usermod/useradd-like commands on mounted dir, which might or might not copy/overwrite files from /etc/skel in the new userdir.
So, @maresb was suggesting a bit different implementation (which would essentially have the same goal and result).

@benz0li Also, if you can share issues (if you had any) or something we need to be aware of (when using such a backup approach), that would be great.
I appreciate your feedback and ideas.

As far as I understand, you don't manually change ownership of backup to new user (if NB_UID was manually set for example), which might not work, I suppose? (unless user runs with CHOWN_HOME).

So, my plan is:

  1. Move run-hooks to a separate file
  2. Test this file
  3. Manually add some files to a backup and a populate.sh script.
  4. Also, add a test for many possible use cases:
    • NB_USER set or not
    • NB_UID set or not
    • CHOWN_HOME set or not
    • New ${HOME} is mounted or not (where some files might be already present).

@benz0li Also, if you can share issues (if you had any) or something we need to be aware of (when using such a backup approach), that would be great. I appreciate your feedback and ideas.

Bind mounting a home directory a quite delicate matter. My JupyterLab docker stacks allow bind mounting the same home directory by any image so the init.sh scripts (before-notebook.d hook) were far trickier. E.g.

As far as I understand, you don't manually change ownership of backup to new user (if NB_UID was manually set for example), which might not work, I suppose? (unless user runs with CHOWN_HOME).

Correct. See also https://github.com/b-data/jupyterlab-python-docker-stack#create-home-directory


Use case: https://demo.jupyter.b-data.ch

I checked that the example works fine though, because:

  1. We mount work subdir
  2. We don't change the NB_USER.
    In such a case we have default .bashrc and other files from the image.

Correct. Exactly this example is the only exception.

@mathbunnyru ℹ️ I found a way to enable bind mounting a subfolder of the home directory for arbitrary $NB_USERs and thus resolve b-data/jupyterlab-python-docker-stack#1.

Users can now choose whether to (bind) mount the entire home directory or just a subfolder within it.

@benz0li that's great, I see the issue this resolves.

To be honest, I don't understand everything about your implementation (I am not great at shell scripting), though.
I wonder if Python implementation would make it better or worse.

Anyone who wants to implement a similar thing in this repo, it would be really nice (but we will have to add extensive testing).