-
Local
-
Interface
- Mobile shell (
mosh
, no moressh
!) - Jupyter notebook
- Mobile shell (
-
Remote
- Software
- Hardware
- Grab a good terminal application! (I personally use iTerm2 for Mac)
- Grab a good font! (I personally use Meslo 13pt)
- Grab a good screen!
Ever feel frustrated when
- you have to reconnect to the server every time your computer wakes up?
- it lags while you are on tethered internet/airplane WiFi?
local~$ mosh $MY_ID@nightingale.csail.mit.edu
Ever feel frustrated when
- you have to remember every argument to properly start notebook?
- you have to copy the port over to browser?
- you get
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: iceweasel: not found /usr/bin/xdg-open: 778: /usr/bin/xdg-open: seamonkey: not found ...
# remote:~/.bashrc
alias notebook="jupyter notebook --ip 0.0.0.0 --port $MY_FAV_PORT --no-browser"
and always go to http://nightingale.csail.mit.edu:$MY_FAV_PORT
.
Ever feel frustrated when
- there is not a default virtual environment when you log in?
- always having to type
nvidia-smi
fornvidia-smi
?
Just add aliases and startup scripts to ~/.bashrc
!
# remote:~/.bashrc
if (tty -s); then
source activate $MY_CONDA_PATH
fi
alias smi="nvidia-smi"
Some very useful command
htop
: monitors basically everything, from CPU load, memory, to process IDs
Ever feel frustrated when
- you want to run very long tasks and need to keep
ssh
open? - seeing this when your session runs over 8 days on the server?
Could not find platform independent libraries <prefix> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] Fatal Python error: Py_Initialize: Unable to get the locale encoding LookupError: no codec search functions registered: can't find encoding Current thread 0x00007f2e99afc700 (most recent call first): Aborted
# Getting your Kerberos ticket, "--keychain" enables you to use only "kinit" from now on
local~$ kinit --keychain $MY_ID@CSAIL.MIT.EDU
# Fire up a longtmux remotely and quit, because
local~$ ssh $MY_ID@nightingale.csail.mit.edu longtmux
# Because we want to MOSH IN and tmux [a]ttach that session
local~$ mosh $MY_ID@nightingale.csail.mit.edu tmux a
When your ticket expire on the server
remote~$ kinit && aklog
- Custom status bar
- Split panes
- Split tabs
- Mouse support (Yes, you can click in text editors with mouse! Even drag those separators!)
# remote:~/.tmux.conf
set-option -g base-index 1
set-option -g default-terminal "screen-256color"
# ... (see .tmux.conf for more)
Ever feel frustrated when
- you find messy package dependencies that
pip
does not quite manage well - you want to share virtual environment between collaborators
To create from an exported yml file
# The exported yml should contain a minimal list of dependencies to avoid clutter
remote~$ conda env create -f environment.yml
To create a shared environment
# Make sure your current python interpreter is accessible by all users!
remote~$ deactivate
# Please refer to the hard disk section to check the shared folder to use
remote~$ conda env create --prefix $SHARED_FOLDER python=3.6.5 --copy
conda
can be slow installing packages as it checks beyond python package dependencies: it also checks for library dependencies- When you encounter
OSError: [Errno 28] No space left on device
: this is becauseconda
caches packages in your~/.conda
. Simply doconda clean -a
Ever feel frustrated when
conda
takes forever to install simple packages?
remote~$ pip install -r requirements.txt
Useful packages for various purposes
gpustat
: GPU stats. Best used aswatch --color gpustat -ucp --color
tqdm
: nice progress bar to monitor training progresshtmltag
+json2html
: pretty demo for your project
powerline-shell
: Fancy, and most importantly useful status bar for bash
remote~$ pip install powerline-shell
# remote:~/.bashrc
function _update_ps1() {
PS1=$(powerline-shell $?)
}
if [[ $TERM != linux && ! $PROMPT_COMMAND =~ _update_ps1 ]]; then
PROMPT_COMMAND="_update_ps1; $PROMPT_COMMAND"
fi
Name | Memory | GPU | Best used for |
---|---|---|---|
nightingale | 1008 GB | 4 x GeForce GTX TITAN X | CPU memory intensive tasks |
harrison | 126 GB | 4 x GeForce GTX TITAN X | GPU intensive tasks |
gray | 126 GB | 4 x GeForce GTX TITAN X | GPU intensive tasks |
safar | 193 GB | CPU intensive tasks |
- Transfer files between machines with
rsync
rather than using shared disk
Ever feel frustrated when
- some takes all available GPU memory but not actually performing any computation?
- managing multiple experiments running on multiple machines?
- Limit memory growth!
config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, ...)
- Set
CUDA_VISIBLE_DEVICES
environment variable before running the programremote~$ CUDA_VISIBLE_DEVICES=0 python mine-ethereum-hehe.py
- Fill in GPU allocation sheets!
Ever feel frustrated when
- running out of disk space right before a deadline?
- migrating data across devices to work across machines?
- oh snap I deleted my code!
- Information in the table may not be accurate, please correct me
- Need discussion here: periodic housekeeping?
Type | Mount path | Purpose | Backup |
---|---|---|---|
AFS | ~ or /afs/csail.mit.edu/u/m/$MY_ID |
lightweight files (code, cache) | ~/.snapshot |
NFS (production tier) | /data/medg |
(raw data)? | /data/medg/misc/.zfs/snapshot |
Local (each machine) | /crimea |
datasets | |
Local (each machine) | /scratch |
(cached data, models)? |
- How much storage is the current directory using?
du -sh
- What about the disks? How much space is left?
df -h
-
Figure out a group name of the project, and ask system admin to create a user group on all machines (or the specific machine you are working on) with all collaborators added the group.
-
Identify a dataset root (e.g.,
/data/medg/misc/definitely-not-cryptomining
) with the correct group access (chgrp -R ...
). -
Locate a folder for code (e.g.,
~/definitely-not-cryptomining
). Note that the code should be only accessible by you; any code sharing should happen over version control software. -
Find a local working root directory (e.g.
/scratch/definitely-not-cryptomining
), and sync data over withrsync
. -
(Optional, but extremely recommended) Instantiate a shared virtual environment in the local directory.
-
Happy coding.
-
When you are running tasks, try
htop
andnvidia-smi
(orwatch --color gpustat -ucp --color
) to determine the best machines/GPUs. Try not to overuse hardware resources. -
Save intermediate results/models to local project directory.
-
Periodically push your local
git
commits to GitHub.
-
I have
mosh
set up, so basically the following login steps are persistent for a few days before I restart the sessions. -
On local machine I have the following bash functions that enable blazing fast access: only need to
kinit && tmux-csail nightingale
function tmux-csail() { ssh stmharry@$1.csail.mit.edu longtmux mosh stmharry@$1.csail.mit.edu tmux a }
-
I can access different machines on different iTerm tabs, and since inside a
tmux
session I as well have tabs for file editing, program running, and resource monitoring. (i.e.tmux
tabs under iTerm tabs) -
I use
vim
as the editor for most code editing for a lot ofpython
handy plugins. I can edit a handful of files at the same time for its support for tabs. (i.e.vim
tabs undertmux
tabs under iTerm tabs) -
In my
~/.bashrc
there are a few aliases that helps me with commands.