Populate home dir if it is empty
benz0li opened this issue · 26 comments
What docker images this feature is applicable to?
jupyter/base-notebook
What changes do you propose?
Populate home dir if it is empty.
How does this change will affect users?
If a user mounts to /home/{raw_username}
it will be populated from /home/jovyan
on initial startup.
ℹ️ NB_USER
set to {raw_username}
; the unescaped username
(e.g. from JupyterHub).
diff --git a/base-notebook/start.sh b/base-notebook/start.sh
index 05c1037..9e59ab2 100644
--- a/base-notebook/start.sh
+++ b/base-notebook/start.sh
@@ -123,6 +123,15 @@ if [ "$(id -u)" == 0 ] ; then
exit 1
fi
fi
+ # The home directory could be bind mounted. Populate it if it is empty
+ elif [[ "$(ls -A "/home/${NB_USER}" 2> /dev/null)" == "" ]]; then
+ _log "Populating home dir /home/${NB_USER}..."
+ if cp -a /home/jovyan/. "/home/${NB_USER}/"; then
+ _log "Success!"
+ else
+ _log "Failed to copy data from /home/jovyan to /home/${NB_USER}!"
+ exit 1
+ fi
fi
# Ensure the current working directory is updated to the new path
if [[ "${PWD}/" == "/home/jovyan/"* ]]; then
|| ln -s /home/jovyan "/home/${NB_USER}"
is obsolete, because the home dir is copied [and not moved] now.
In my custom JupyterLab docker stack a duplicate of /home/jovyan
is kept at /var/tmp/jovyan
. If a user bind mounts to /home/jovyan
itself and NB_USER
is set to jovyan
, the home dir is pre-populated by the following script:
/usr/local/bin/start-notebook.d/populate.sh
:
#!/bin/bash
set -e
if [[ "$(ls -A "/home/jovyan" 2> /dev/null)" == "" ]]; then
cp -a /var/tmp/jovyan/. /home/jovyan
fi
Which are the files here that you care about? Is your home directory totally empty, like missing .bashrc
? Or is it the work
folder? I'm trying to understand what test cases we should write.
Yes, the home directory is entirely empty if bind mounted to /home/{raw_username}
. As it exists, ! -e "/home/${NB_USER}"
is FALSE
, thus nothing is copied from /home/jovyan/
to /home/${NB_USER}
.
ℹ️ NB_USER
set to {raw_username}
It's the whole content of /home/jovyan/
I care about.
For jupyters base-notebook this would take care about
-rw-rw-r-- 1 jovyan users 220 Feb 25 2020 .bash_logout
-rw-rw-r-- 1 jovyan users 3823 Nov 8 05:07 .bashrc
drwsrwsr-x 2 jovyan users 4096 Nov 8 05:09 .cache
drwsrwsr-x 1 jovyan users 4096 Nov 8 05:08 .conda
drwsrws--- 3 jovyan users 4096 Nov 8 05:09 .config
drwsrws--- 2 jovyan users 4096 Nov 8 05:09 .jupyter
-rw-rw-r-- 1 jovyan users 807 Feb 25 2020 .profile
-rw-rw-r-- 1 jovyan users 227 Nov 8 05:07 .wget-hsts
drwsrwsr-x 2 jovyan users 4096 Nov 8 05:07 work
For my custom jupyterlab image this would take care about
-rw-r--r-- 1 jovyan users 220 Aug 4 20:25 .bash_logout
-rw-r--r-- 1 jovyan users 3768 Nov 5 10:50 .bashrc
drwxr-xr-x 3 jovyan users 4096 Nov 5 10:50 .local
drwxr-xr-x 12 jovyan users 4096 Nov 5 10:50 .oh-my-zsh
-rw-r--r-- 1 jovyan users 807 Aug 4 20:25 .profile
-rw-r--r-- 1 jovyan users 4219 Nov 5 10:50 .zshrc
Only for the initial/first startup and if the mounted home directory at /home/${NB_USER}
is empty.
@consideRatio the merge seems to have unintentionally caused this issue to be closed prematurely. (Probably I shouldn't have had "close #1478" in my description! 😆)
Or is it the work folder?
I find the suggestion of -v "${PWD}":/home/jovyan/work
at https://github.com/jupyter/docker-stacks#quick-start > 'Example 2' rather odd, because this does not preserve dotfiles (user-specific application configuration) - e.g. ~/.local
, ~/.config
, etc.
That's one of many reasons why I'm building my own Jupyter docker stack incorporating the changes listed above and mounting the entire home directory.
Closing due to inactivity.
@benz0li I do like this proposal (sorry I haven't answered 2 years ago 🙂).
I think the best way for us would be to do the following:
- Backup
/home/${NB_USER}
as the last stage of Dockerfile. - In the
start.sh
script restore these files to/home/{raw_username}
(and we'll need to fix permissions in some cases).
We'll only be restoring a file or directory if it doesn't exist already.
I see 2 current issues, which will be solved by such a behaviour:
Or is it the work folder?
I find the suggestion of
-v "${PWD}":/home/jovyan/work
at https://github.com/jupyter/docker-stacks#quick-start > 'Example 2' rather odd, because this does not preserve dotfiles (user-specific application configuration) - e.g.~/.local
,~/.config
, etc.That's one of many reasons why I'm building my own Jupyter docker stack incorporating the changes listed above and mounting the entire home directory.
I checked that the example works fine though, because:
- We mount
work
subdir - We don't change the NB_USER.
In such a case we have default.bashrc
and other files from the image.
A fresh thought on this... Linux has a default template in /etc/skel
for templating home directories of new users. I'm not sure whether or not it would make sense to use this in this case as the "backup" of /home/${NB_USER}
.
@mathbunnyru Check the difference from https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/scripts/usr/local/bin/start.sh to the current file of this repository.
ℹ️ There will be some additional stuff because I use Zsh as default shell and have code-server installed.
I populate with https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/scripts/usr/local/bin/start-notebook.d/populate.sh using a start-notebook.d
hook.
Thanks @maresb, that's a good point.
We have some files, which are more our image specific, like .jupyter
subdir, which might create environments, added by users.
Is it a good idea to put such files/dirs to /etc/skel
as well?
I don't know if we can actually use /etc/skel
unfortunately.
If a user has a custom NB_USER and mounts the homedir, what is the behaviour of /etc/skel
?
Is it gonna ignore existing files/dirs, break or overwrite?
Thanks @benz0li!
Could you please tell, how do you backup /home/jovyan
files?
Manually in each image (adding newly-created or changed files) or automatically just by archiving the whole directory (using some backup.sh
-like script)?
It's nice to see someone using our start-notebook.d
startup hook, because it is not tested at all 😆
Using /etc/skel
may be a bad idea. It will depend on the use case. Also, changing /etc/skel
could interfere with the behavior of existing images. I just wanted to point out its existence, but unfortunately I don't have the headspace at the moment to evaluate the merits in this instance.
Could you please tell, how do you backup
/home/jovyan
files?
Automatically in each image (adding newly-created or changed files) or just by archiving the whole directory?
I originally wanted this done only in the so called base images of my JupyterLab docker stacks.
There is one exception, though: https://github.com/b-data/jupyterlab-r-docker-stack/blob/3e81912c6763d0f198901ac0c19a5ce027cfa03f/qgisprocess/latest.Dockerfile#L249-L250
I actually changed my mind - I think it's better to manually backup files we want to preserve (like .bashrc
), rather than backing up whole dir each time:
- It's gonna be more explicit.
- It's easy to see in which image file was created.
- We have more control, and we probably don't want copy something like
.lesshst
or wget history file and so on. - I think we'll have most of the copying in the
docker-stacks-foundation
image and other images won't change at all. - Downside - if someone changes file/dir location, then the image will break - which is obviously rare and not such a big deal.
Thanks @benz0li!
I see you're copying the whole HOME in base, and one specific dir in some inherited image.
If a user has a custom NB_USER and mounts the homedir, what is the behaviour of
/etc/skel
?
Is it gonna ignore existing files/dirs, break or overwrite?
If I remember correctly, the bind mounted home directory is not populated [with /etc/skel
].
I am quite sure, that was the reason I copied to /var/backups/skel
.
`> If I remember correctly, the bind mounted home directory is not populated.
Yes, you're right, but we're calling usermod/useradd
-like commands on mounted dir, which might or might not copy/overwrite files from /etc/skel
in the new userdir.
So, @maresb was suggesting a bit different implementation (which would essentially have the same goal and result).
@benz0li Also, if you can share issues (if you had any) or something we need to be aware of (when using such a backup approach), that would be great.
I appreciate your feedback and ideas.
As far as I understand, you don't manually change ownership of backup to new user (if NB_UID was manually set for example), which might not work, I suppose? (unless user runs with CHOWN_HOME).
So, my plan is:
- Move
run-hooks
to a separate file - Test this file
- Manually add some files to a backup and a
populate.sh
script. - Also, add a test for many possible use cases:
NB_USER
set or notNB_UID
set or notCHOWN_HOME
set or not- New
${HOME}
is mounted or not (where some files might be already present).
@benz0li Also, if you can share issues (if you had any) or something we need to be aware of (when using such a backup approach), that would be great. I appreciate your feedback and ideas.
Bind mounting a home directory a quite delicate matter. My JupyterLab docker stacks allow bind mounting the same home directory by any image so the init.sh
scripts (before-notebook.d
hook) were far trickier. E.g.
- https://github.com/b-data/jupyterlab-r-docker-stack/blob/3e81912c6763d0f198901ac0c19a5ce027cfa03f/base/scripts/usr/local/bin/before-notebook.d/init.sh
- https://github.com/b-data/jupyterlab-r-docker-stack/blob/3e81912c6763d0f198901ac0c19a5ce027cfa03f/qgisprocess/scripts/usr/local/bin/before-notebook.d/init.sh
ℹ️ Separate variant of the script for theqgisprocess
image.
As far as I understand, you don't manually change ownership of backup to new user (if NB_UID was manually set for example), which might not work, I suppose? (unless user runs with CHOWN_HOME).
Correct. See also https://github.com/b-data/jupyterlab-python-docker-stack#create-home-directory
Use case: https://demo.jupyter.b-data.ch
I checked that the example works fine though, because:
- We mount
work
subdir- We don't change the NB_USER.
In such a case we have default.bashrc
and other files from the image.
Correct. Exactly this example is the only exception.
@mathbunnyru ℹ️ I found a way to enable bind mounting a subfolder of the home directory for arbitrary $NB_USER
s and thus resolve b-data/jupyterlab-python-docker-stack#1.
Users can now choose whether to (bind) mount the entire home directory or just a subfolder within it.
@benz0li that's great, I see the issue this resolves.
To be honest, I don't understand everything about your implementation (I am not great at shell scripting), though.
I wonder if Python implementation would make it better or worse.
Anyone who wants to implement a similar thing in this repo, it would be really nice (but we will have to add extensive testing).