Bio-Ansible

Do bioinformatics not sys-admining - run the playbook and get back to work !

Content

Bio-Ansible

Quick start

We assume some familiarity with Ansible.

Bring up a VM (AWS, OpenStack / NeCTAR, etc)
Add your ssh-keys to ~/.ssh/authorized_keys on the instance
Edit the hosts file to add the target VM IP address, edit group_vars/all files to change:
- sudo_guy to the username used to log into the remote machine
- main_guy to a username that will be created for installing software (can be the same as sudo_guy)

ansible-playbook -i hosts all.yml

Introduction

This bio-ansible is multi-potent as it can set up from scratch the whole army of servers with bioinformatics (genomic) focus or just install handful of selected tools. A subset of the bio-ansible playbooks can be run as a as a non-privileged user, in particular if you are just installing bio-tools in your home directory on a shared system (eg HPC).

However you still might need to install some "common" dependencies and for that you might need sudo. Also note that Ansible tasks are intended to be ‘idempotent’, meaning if you run them again, they will generally only make the changes they must in order to bring the system to the desired state. This means it is safe to rerun the same playbook multiple times.

These playbooks target Ubuntu 20.04 and 22.04 - they may work with small modifications on newer Ubuntu releases and other Debian-flavoured distros. YMMV.

Running bio-ansible

Setup and dependencies

Install ansible

mkdir ~/.virtualenvs
virtualenv -p python3 ~/.virtualenvs/ansible
source ~/.virtualenvs/ansible/bin/activate
pip3 install -U pip

# bio-ansible requires Ansible 8 (ansible-core 2.15.x), newer versions may work
pip3 install -U "ansible==8"

Clone the git repo:

git clone https://github.com/MonashBioinformaticsPlatform/bio-ansible.git

Edit hosts file to include the remote host IP addresses into the appropriate group. If running against remote host(s), setup your ssh-keys and use ssh-add to add them to the local sss-agent.
Edit group/all file to include your username as main_guy variable (this is the username used to access the target host[s])
Optional: Download any tar archives for non-FOSS software into tarballs/ (or the path set in the tarballs_path variable) - see the section on manually downloading tarballs below.

Running the playbooks

Install many bioinformatics tools as 'modules'. This is often possible as a non-privileged user without sudo. The user defined in the main_guy variable is used:

ansible-playbook -i hosts bio.yml

Install system-wide dependencies and packages - sudo privilege is required:

ansible-playbook -i hosts common.yml

Interacive web-based services - sudo privilege is required:

ansible-playbook -i hosts common.yml

Or, if you want to try installing everything above in one go (sudo privilege is required on the target host[s]):

ansible-playbook -i hosts all.yml

Installing specific tools

Alternatively you can install specific tools without running the whole playbook by using tags:

ansible-playbook -i hosts bio.yml --tags samtools,star,subread

You can see all available tags for a playbook with:

ansible-playbook bio.yml --list-tags

Protip: You can always add -v or -vvv options for verbose mode to help diagnose failures

singularity-hpc (shpc)

Some modules are installed via shpc, which formalizes wrapping up Singularity containers as LMOD modules. Users can also install their own modules with a small amount of configuration. You can find many tools pre-packaged for shpc at the shpc-registry.

Users should run:

shpc config inituser

# Create a directory for all user shpc containers and module definitions
mkdir $HOME/shpc

shpc config set container_base $HOME/shpc/containers
shpc config set module_base $HOME/shpc/modules
shpc config set views_base $HOME/shpc/views

# Make LMOD aware of the users module definitions
module use $HOME/shpc/modules
# Make the MODUPLEPATH setting more permanent
echo -e '\nexport MODULEPATH=$HOME/shpc/modules:$MODULEPATH' >>~/.bashrc

Building a Docker image

See README.docker.md

Frequently asked questions

Other

Manually downloading tarballs

Because of the licenses some installation files need to be manually downloaded into a 'tarballs' directory. By default this is tarballs in the playbook base path - this location can be set using the tarballs_path variable if required. The playbook.yml will skip installation of those packages if it doesn't find the archive files in that directory.

Manual scripts

There are scripts to download various databases in scripts/. These have deliberately not been added to ansible.