Table of contents:
- 1. Overview
- 2. Non-exhaustive list of customizations
- 3. Installation
- 4. Usage
- 5. Appendix
- 6. Related Projects
- 7. Security
- 8. License
- 9. Acknowledgements
This repo contains scripts to re-run common tweaks on a fresh (i.e., newly created or rebooted) SageMaker classic notebook instance, to make the notebook instance a little bit more ergonomic for prolonged usage.
After running these scripts your default command-line terminal will go from this:
To something like this:
Once installed, everytime you access a newly restarted notebook instance, you just need to perform these three short, simple steps to see and experience the customizations:
- open a terminal,
- run a one-liner command line
~/SageMaker/initsmnb/setup-my-sagemaker.sh
, - restart the Jupyter process.
By supporting a simple one-liner command line, we hope that you can quickly test this repo as a data scientist (with your notebook instance as all that you need), rather than as an infrastructure engineer which typically works with more sophisticated automation tools or services.
We hope that you find this repo useful to adopt into your daily work habits.
Please note that tweaks marked with [Need sudo] can only be in-effect when your notebook instance enables root access for notebook users, and [Need internet] requires internet connection.
-
Jupyter Lab:
-
Reduce font size on Jupyter Lab, and show line numbers on editors.
-
[Need sudo] Terminal defaults to
bash
shell, dark theme, and smaller font. -
[Need sudo] In addition to SageMaker's built-in conda environments, Jupyter Lab to also auto-scan
/home/ec2-user/SageMaker/envs/
for custom conda environments.This allows for a "persistent" conda environment under
/home/ec2-user/SageMaker/envs
that survives instance reboot.You can create a new custom conda environment as follows:
conda create --prefix /home/ec2-user/SageMaker/envs/MY_CUSTOM_ENV_NAME python=3.9 ipykernel
. Replace the environment name and python version with your choice. Please note that conda environment must haveipykernel
package installed. Once the environment is created, you may need to restart JupyterLab before you can see the environment listed as one of the kernels.
-
-
Git:
- Optionally change committer's name and email, which defaults to
ec2-user
- git aliases:
git lol
,git lola
,git lolc
, andgit lolac
- New repo (i.e.,
git init
) defaults to branchmain
nbdime
for notebook-friendly diffs
- Optionally change committer's name and email, which defaults to
-
Terminal:
bash
shortcuts:alt-.
,alt-b
,alt-d
, andalt-f
work even when connecting from OSX.- [Need sudo & internet] Install command lines:
htop
,tree
,dos2unix
,dstat
,tig
(alinux only),ranger
(the CLI file explorer), cookiecutter, pre-commit, s4cmd, black-nb, black, jupytextpre-commit
caches of hook repositories survive rebootsranger
is configured to use relative line numbers- Whenever possible, commands are installed to the persistent area under
~/SageMaker/.initsmnb.d/
, so that on reboot, the tweaking script can skip re-installing those commands to speed-up the tweaking time.
-
ipython run from Jupyter Lab's terminal:
- shortcuts:
alt-.
,alt-b
,alt-d
, andalt-f
work even when connecting from OSX. - recolor
o.__class__
from dark blue (nearly invisible on the dark theme) to a more sane color.
- shortcuts:
-
Some customizations on
vim
:-
Notably, change window navigation shortcuts from
ctrl-w-{h,j,k,l}
toctrl-{h,j,k,l}
.Otherwise,
ctrl-w
is used by most browsers on Linux (and Windows?) to close a browser tab, which renders windows navigation invim
unusable. -
Other opinionated changes; see
init-vim.sh
.
-
-
[Need sudo] Optionally mount one or more EFS.
-
Experimental tweaks (off by default)
This step needs to be done once on a newly created notebook instance.
You can choose to have the installation process automatically download the necessary files from this repo, provided that your SageMaker classic notebook instance has the necessary network access to this repo.
Another choice is to bootstrap this repo into your SageMaker classic notebook instance, then invoke the install script in its local mode.
Go to the Jupyter Lab on your SageMaker notebook instance. Open a terminal, then run this command:
curl -sfL \
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-customization/main/initsmnb/install-initsmnb.sh \
| bash -s -- --git-user 'First Last' --git-email 'ab@email.abc'
Both the --git--user 'First Last
and --git-email ab@email.abc
arguments are
optional. If you're happy with SageMaker's preset (which uses ec2-user
as
the commiter name), you can drop these two arguments from the install command.
If you want to auto-mount one or more EFS, install as follows:
curl -sfL \
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-customization/main/initsmnb/install-initsmnb.sh \
| bash -s -- \
--git-user 'First Last' \
--git-email 'ab@email.abc' \
--efs 'fs-123,fsap-123,my_efs_01' \
--efs 'fs-456,fsap-456,my_efs_02'
All mount points will live under /home/ec2-user/mnt/
. Thus, the above example
will install a script that can mount two EFS, the first one fs-123
will be
mounted as /home/ec2-user/mnt/my_efs_01/
, while the second one fs-456
will
be mounted as /home/ec2-user/mnt/my_efs_02/
.
After the installation step finishes, you should see a new directory created: /home/ec2-user/SageMaker/initsmnb/
.
Your next step is to jump to section Usage.
On your SageMaker notebook instance, open a terminal and run these commands:
cd ~/SageMaker
git clone https://github.com/aws-samples/amazon-sagemaker-notebook-instance-customization.git
cd amazon-sagemaker-notebook-instance-customization/initsmnb
./install-initsmnb.sh --from-local --git-user 'First Last' --git-email 'ab@email.abc'
After the installation step finishes, you should see a new directory created: /home/ec2-user/SageMaker/initsmnb/
.
Your next step is to jump to section Usage.
Once installed, you should see file /home/ec2-user/SageMaker/initsmnb/setup-my-sagemaker.sh
.
To apply the customizations to the current session, open a terminal and run
~/SageMaker/initsmnb/setup-my-sagemaker.sh
. Once the script finishes, please
follow the on-screen instruction to restart the Jupyter server (and after that,
do remember to reload your browser tab).
Due to how SageMaker notebook works, please re-run setup-my-sagemaker.sh
on a
newly started or restarted instance. You may even consider to automate this
step using SageMaker lifecycle config.
On the Jupyter Lab's terminal, run this command:
# For notebook instance with alinux
sudo initctl restart jupyter-server --no-wait
# Use this instead, for notebook instance with alinux2
sudo systemctl restart jupyter-server
After issuing the command, your Jupyter interface will probably freeze, which is expected.
Then, reload your browser tab, and enjoy the new experience.
To change the terminal font size, after installation
- open
/home/ec2-user/SageMaker/initsmnb/change-jlab-ui.sh
in a text editor, - go to the section that customizes the terminal,
- then change the fontsize (default is 10) to another value of your choice.
Advance users may want to explore and enable the experimental tweaks. These are off by default, and
must be enabled by modifying ~/SageMaker/initsmnb/setup-my-sagemaker.sh
to set
ENABLE_EXPERIMENTAL=1
. Please refers to the script itself to find out the details of the tweaks.
Presently, these are the experimental tweaks:
-
disable the git extension for Jupyter Lab. This is aimed for power users who primarily use git from CLI, and do not want to be distracted by Jupyter Lab's frequent refreshes on the lower-left status bar.
-
enable SageMaker local mode.
-
centralize notebook checkpoints to
/tmp/.ipynb_checkpoints/
. This prevents.ipynb_checkpoints/
from making its way into the tarballs generated by SageMaker SDK for training, inference, and framework processing scripts, and model repack. -
relocate docker's data-root to persistent area
~/SageMaker/.initsmnb.d/docker/
, so that after reboot, yourdocker images
won't show empty images anymore (provided you've docker build or pull before). -
relocate docker's tmpdir to persistent area
~/SageMaker/.initsmnb.d/tmp/
, so that you can build large custom images that require more space than what/tmp
(i.e., on root volume) provides.- a secondary benefit is to allow SageMaker local mode to run with S3 input that's larger than
what
/tmp
(i.e., on root volume) provides. Please note SageMaker local mode will copy the S3 input to the docker's tmpdir, but upon completion the SDK won't remove the tmp dir. Hence, you need to manually remove the temporary S3 inputs from the persistent docker's tmpdir.
- a secondary benefit is to allow SageMaker local mode to run with S3 input that's larger than
what
Once you've customized your development environment on your SageMaker classic notebook instance, we invite you to explore related samples.
-
aws-samples/python-data-science-template shows a one-liner command line that instantenously auto-generate a modular structure for your new Python-based data science project.
-
aws-samples/amazon-sagemaker-entrypoint-utilities is a sample library to help you quickly write a SageMaker meta-entrypoint script for training. This approach aims to reduce the amount of boilerplate codes you need to write for model training, such as argument parsings and logger configurations, which are repetitive and tedious.
-
ML Max is a set of example templates to accelerate the delivery of custom ML solutions to production so you can get started quickly without having to make too many design choices. At present, it covers four pillars: training pipeline, inference pipeline, development environment, and data management/ETL.
-
Learn about a different mechanism to create custom Jupyter kernel on a SageMaker classic notebook instance, described in aws-samples/aws-sagemaker-custom-jupyter-kernel.
-
Wearing an "infrastructure engineer" hat -- when you're ready or allowed to implement the customizations as a lifecycle configuration for your SageMaker notebook instance, feel free to further explore these examples.
-
Data science on Amazon EC2 with vim, tmux and zsh hosts a simple template to set up basic Vim, Tmux, Zsh for the Deep Learning AMI Amazon Linux 2 for data scientists.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
@yapweiyih (EFS), @josiahdavis and @kianho (vim), @theoldfather (docker relocation), @aws/amazon-sagemaker-examples (SageMaker local mode), @yinsong1986 (persistent custom conda environment), @verdimrc (misc.), the originator of git lol & lola (earlier traceable could be this blog.