Setting up VMs for use in a training workshop is achieved by:
- Create a "template VM" containing the workshop software and data
- Image the template VM
- Instantiate a "test VM" from the image
- Set up test user
- Test workshop content
- Instantiate multiple workshop VMs from the image
- Setup workshop users
Create a vanilla Ubuntu VM in the cloud (e.g. AWS, GCP, etc). SSH into it and clone this repository:
ssh -J hpc.sahmri.com 34.116.110.196
sudo su -
git clone https://github.com/sagc-bioinformatics/workshop-sysadmin.git
cd workshop-sysadmin
We'll install some really useful tools and configuration options:
apt update
apt dist-upgrade -y
apt install -y \
python3 \
tree \
bmon \
htop \
screen
#####
# Change timezone
#####
dpkg-reconfigure tzdata
#####
# Setup conda
#####
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Install to /opt/miniconda3
bash ./Miniconda3-latest-Linux-x86_64.sh \
-b \
-p /opt/miniconda3
source /opt/miniconda3/etc/profile.d/conda.sh
#####
# Add conda initialisation for new users
#####
cat .bashrc >>/etc/skel/.bashrc
#####
# Setup screenrc for new users
#####
cat .screenrc >>/etc/skel/.screenrc
Software:
#####
# Setup conda environment for all users
#####
mkdir -p /data/envs
cp wgbs.yaml /data/envs/wgbs.yaml
# Install environments
conda install mamba -n base -c conda-forge -y
mamba env create -f /data/envs/wgbs.yaml
# If need to update the environment
#mamba env update --name agrigenomics --file /data/envs/wgbs.yaml
sudo apt install -y \
python3-pip
pip3 install cpgtools
Put workshop data on the VM. This assumes the data has already been generated somehow and is simply available to download from a public source.
#####
# Download from CloudStor
#####
curl https://cloudstor.aarnet.edu.au/plus/s/HBCVy6IKjVTJYq7/download \
> wgbs.tar
tar -C / -xf wgbs.tar && rm wgbs.tar
While creating/testing the workshop content, it is advantagous to push the workshop data to a publically accessible location. We can then use this during VM instantiation by pulling that data to each new VM. The easiest way to achieve this is to create a tar archive and push it to CloudStor:
cd /data
sudo tar -cf wgbs.tar wgbs
sudo apt install rclone
# Set this variable to the App password you generated in the above steps
USERNAME='my_cloudstor_username'
APP_PASSWORD='MY_CLOUDSTOR_APP_PASSWORD'
# Now create the rclone configuration
rclone config create \
SAGC_CloudStor \
webdav \
url https://cloudstor.aarnet.edu.au/plus/remote.php/webdav/ \
vendor owncloud \
user "${USERNAME}" \
pass "${APP_PASSWORD}" \
--obscure
rclone copy wgbs.tar SAGC_CloudStor:Shared/workshops/
The image generated from the template VM doesn't have any active user accounts. We don't want an image containing default username and passwords since every VM would have the same login details and these will likely get leaked into the public domain at some point. This would be a security risk!
Instead, we have a script (trainusers.sh
) capable of of managing user accounts which must be used after VM instantiation to:
- Create user accounts
- Enable password logins via SSH
- Enable user accounts
Similarly, this script can be used at the end of the workshop to:
- Disable user accounts
- Disable password logins via SSH
- Delete user accounts
The trainusers.sh
help:
$ ./trainusers.sh -h
USAGE: trainusers.sh [-h] -a <action> [-u <user_prefix>] [-s <start>] [-e <end>] [-z <zero_padding>] [-l <length_of_password>] [[-p <pwgen_arg>] ...]
User/password creation/management.
where:
-h Show this help text
-a Action to perform (create|chpass|lock|unlock|pwon|pwoff|delete)
create: create users with passwords
chpass: change the user's passwords
lock: lock user's passwords
unlock: unlock user's passwords
pwon: allow password authentication in SSH
pwoff: disable password authentication in SSH
delete: delete user accounts
-u Username prefix (default: sagc)
-s Username suffix start number (default: 1)
-e Username suffix end number (default: 10)
-z Zero-pad username suffix number to this length (default:3)
-l Password length (default: 12)
-p Arguments to pass to pwgen (default: --ambiguous --symbols --numerals --capitalize)
# Simple form
# Creates 10 user accounts (sagc001 to sagc010)
# Passwords of 12 characters in length
sudo ./trainusers.sh \
-a create
# Complex form
# Create a single user account called "sagc_043" with a password of length 23 characters
sudo ./trainusers.sh \
-a create \
-u sagc_ \
-s 43 \
-e 43 \
-l 23
The username and passwords combinations are stored in the file /root/userlist.txt
.
For example:
$ cat /root/userlist.txt
sagc_043:ea4athaX<ohrHi7HiThee)n
Once user accounts are created, we want to ensure that no one can use those accounts until we say so. This is achived by:
# Preventing ANY SSH logins using passwords, only SSH keys allowed
sudo ./trainusers.sh \
-a pwoff
# Lock user accounts
# Make sure to supply the same -u -s -e arguments as
# used for creation, otherwise some user accounts will not be locked
sudo ./trainusers.sh \
-a lock
# Allow SSH logins using passwords
sudo ./trainusers.sh \
-a pwon
# Unlock a specific user's account so only they are able to login
# We'll unlock sagc001 by specifying -s and -e
sudo ./trainusers.sh \
-a unlock \
-s 1 \
-e 1
# Unlock all user accounts that were created
sudo ./trainusers.sh \
-a unlock
Deleting user accounts will also delete their home directory/files. This is particularly useful if you want to reuse/recycle the VM for another day/workshop:
# Delete users sagc003 to sagc005
sudo ./trainusers.sh \
-a delete \
-s 3 \
-e 5
# Delete all default user accounts
sudo ./trainusers.sh \
-a delete
If you have to instantiate and manage a suite of VMs for a hands-on session, doing things 1-by-1 can get very annoying and be error prone. Instead, we will use an API for instansiating VM and performing actions on them. Below, I should how this can achieved using simple for loops. However, you need to be aware of any API rate limits which will slow you down and use some form of "backoff" (preferrably exponential) to slow/limit retry attempts.
Source: https://cloud.google.com/sdk/docs/install#deb
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
sudo apt-get install apt-transport-https ca-certificates gnupg
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update && sudo apt-get install google-cloud-sdk
gcloud init
The following code will:
- generate VM's called
wgbs-001
through towgbs-025
- Pull the wgbs workshop data from a public CloudStore link
- Update the workshop-sysadmin repo on the VM
- Create a single user account
sagc001
on the VM
It is worth noting, that after the VM's are instantiated, SSH access using passwords is NOT enabled. Similarly, the user account is also locked. Just prior to the start of the workshop, you would need to enable SSH login using passwords AND unlock the relevant user account(s) so trainees can log into the VMs.
machine_image_name='wgbs'
ZERO_PADDING_LENGTH=3
START=1
END=25
for i in $(seq --format "%0${ZERO_PADDING_LENGTH}g" ${START} ${END}); do
echo "## Creating VM"
# Try creating VM up to 5 times with 2 mins between retries
# We should exponentially backoff instead
n=0
until [ "$n" -ge 5 ]; do
gcloud beta compute instances create "wgbs-${i}" --source-machine-image "${machine_image_name}" && break
n=$((n+1))
sleep 120
done
# Wait for VM to come up before trying to SSH into it
echo "## Waiting for VM to come up"
sleep 60
# Pull the workshop data and extract it
echo "## Pulling data"
gcloud compute ssh wgbs-${i} -- 'sudo wget --continue -O /data/wgbs.tar https://cloudstor.aarnet.edu.au/plus/s/HBCVy6IKjVTJYq7/download && cd /data && sudo tar -xf /data/wgbs.tar && sudo rm -f /data/wgbs.tar'
# Update the workshop-sysadmin repo code
gcloud compute ssh root@wgbs-${i} -- 'cd /root/workshop-sysadmin && git pull && source /opt/miniconda3/etc/profile.d/conda.sh && mamba env update --name wgbs --file /root/workshop-sysadmin/wgbs.yaml'
# Setup user accounts
echo "## Creating Users"
gcloud compute ssh wgbs-${i} -- 'sudo /root/workshop-sysadmin/trainusers.sh -a create -s 1 -e 1'
# To help prevent exceeding API rate limit
# We should exponentially backoff instead
echo "## sleeping"
sleep 300
done
Once VM's are instantiated and user accounts created, extract that login information. This info will need to be printed along with VM IP addresses for use in the workshop.
for i in $(seq --format "%0${ZERO_PADDING_LENGTH}g" ${START} ${END}); do
gcloud compute ssh wgbs-${i} -- "sudo sed 's/^/'\$(hostname)'\t/' /root/userlist.txt"
done \
> wgbs_login.tsv
Trainee's will NOT be able to access the VM until, SSH login with passwords is enabled and the relevant user accounts unlocked. Do both actions:
for i in $(seq --format "%0${ZERO_PADDING_LENGTH}g" ${START} ${END}); do
gcloud compute ssh wgbs-${i} -- 'sudo /root/workshop-sysadmin/trainusers.sh -a pwon && sudo /root/workshop-sysadmin/trainusers.sh -a unlock -s 1 -e 1'
done
Following the workshop, you probably want to disable trainee access to the VMs. Lock the relevant user account(s) and disable SSH logins using passwords:
for i in $(seq --format "%0${ZERO_PADDING_LENGTH}g" ${START} ${END}); do
gcloud compute ssh wgbs-${i} -- 'sudo /root/workshop-sysadmin/trainusers.sh -a lock -s 1 -e 1 && sudo /root/workshop-sysadmin/trainusers.sh -a pwoff'
done
The above loops for instantiating VMs is slow since there is an API rate limit which prevents more rapid creation of VMs with this approach. Instead, we should aim to use instance templates and create a group of managed VMs: https://stackoverflow.com/a/51978768/1413849
gcloud compute instances bulk create \
--name-pattern="wgbs-#" \
--count=25