Bio Brew

Simple package manager for bioinformatic tools.
Fork of drio’s biobrew with some slight architecture changes.

Introduction

Imagine you get access to a new linux box or HPC cluster. You want to start doing your analysis, coding, data-mining but you do not have your toolbox available. Yes, you can ask the sysadmins to install that for you. That may work. You may also install the tools yourself.

BB (biobrew) helps you either way. With a simple oneliner you can have most of your tools ready to use. BB is a tiny package manager based entirely in bash. It makes the extension of recipes (package definitions) very easy.

BB currently works in Linux enviroments (it should work just fine in other flavours of Unix). It only needs a typical Linux enviroment with gcc, curl (and potentially git). Besides classic and inseparable unix tools, bio.brew’s focus is on bioinformatic tools and packages.

bio.brew has a similar feel to it as the homebrew package manager for Mac OSX. The difference being that it is much more lean in terms of functionality and packages provided, it is written entirely in shell instead of a mixture of shell and ruby, it works on more than just Mac OSX, and its focus is only on core bioinformatic tools.

Give it a try and make it better by improving the framework and adding more recipes.

Differences from Original bio.brew

The recipe seed name is a more fundamental part of the system architecture now. Recipes are defined by the combination of the recipe name as well as the seed name. In effect, the recipe name describes a tool and the seed name indicates a specific version of that tool.
The above architecture allows and encourages keeping multiple versions (seeds) of a tool around but having only one version active at a particular time.
The STAGE_DIR is where the compiled resources for a recipe/seed live. This is separate and external to the installation directory.
The installation directory is configurable by defining BB_INSTALL (see configuration below).
- To be an active seed is to have the binaries of the particular seed of the recipe linked in the bin directory of bio.brew
- This is achieved partly through the use of a current symbolic link for each recipe that points to the directory containing the active seed.
Installation of a recipe seed and activation of that seed are two separate steps in this version of bio.brew.
- This allows some sort of testing to be performed before the current version (seed) of the tool is switched to the new version.
Required operations before/after installation and removal are called automatically, instead of having to call these functions inside individual recipes.
Some additional helper functions for installation and checking external dependencies.
Additional bio.brew commands:
- clean – to remove bio.brew files related to a recipe/seed. Useful for initially developing a recipe to clean out failed installation attempts.
- activate – as mentioned above, activation (moving the current pointer for a recipe to the newly installed seed) is a separate step from install.
- test – stubbed functionality that could be used in the future to add testing for a recipe/seed before it is activated.
- fake – If there is a dependency in a recipe, but you don’t want to use bio.brew to install the dependency, use fake to have bio.brew think that it is installed. (Most useful for java dependencies).

Cavets

These changes were made because of the nature of the environment we wanted bio.brew to work in – and because we believe it is very useful to be able to keep older versions of these tools around as necessary. I feel these changes are an improvement, but there are some things not completely integrated from the original bio.brew:

Not all the recipes have been modified to use the new methodology. See bamtools.sh or cufflinks.sh for examples of recipes that do use the new setup and conventions.
The test script(s) has not been tested and will probably not work.

Install

There are a few ways to install and configure bio.brew. My (current) preferred method involves checking out the source from github using git and then using the bb_load_env setup as detailed below. The requirement for this method is of course that you need git installed on your system.

Install with git

git clone git://github.com/vlandham/bio.brew.git

Install without git

Another potential (untested/unused) option is to use curl:

$ # Download the master branch using curl
$ # -L follow redirects; -s silent; -S show error message if fails; -f fail silently on server errors -k ssl issues
$ curl -LksSf http://github.com/vlandham/bio.brew/tarball/master | tar xvz -C. --strip 1

Configuration

Minimum Configuration

At a minimum, you should define the BB_PATH and BB_INSTALL environmental variables and add $BB_INSTALL/bin and $BB_PATH/bin to your PATH.

BB_PATH defines the location of the bio.brew package, while BB_INSTALL tells bio.brew where to install tools.

This will add the bb executable to your path as well as binaries created using bio.brew. Just adding these variables will not add the recipe specific ENV variables to your path (see below).

Here is an example setup for basic bio.brew Brew use. This example assumes you have the bio.brew tool downloaded to the directory: /home/username/bio.brew and want bio.brew to install tools at /home/username/tools

export BB_INSTALL=/home/username/tools
export BB_PATH=/home/username/tools/bio.brew
export PATH="$BB_INSTALL/bin:$BB_PATH/bin:$PATH"

With this configuration, bio.brew will download tar/zip packages to /home/username/tools/tarballs extract and build these packages in /home/username/tools/stage and then symbolically link the correct binaries from these packages into /home/username/tools/bin.

The bb_load_env script described below automates most of this path configuration. It also provides the extra benefit of also adding ENV variables that are setup by particular recipes. This is described in a bit more detail in the Using bio.brew section below.

bb_load_env Helper Script

The small script bb_load_env has been provided to give one option for configuring bio.brew. You can add bb_load_env to your shell’s configuration file. A default value for BB_INSTALL is provided in bb_load_env, but I would specifiy explictly where you want to download these files. For bash, this would look like:

# .bashrc
export BB_INSTALL=/home/username/tools
[[ -s "/home/username/tools/bio.brew/bb_load_env" ]] && source "/home/username/tools/bio.brew/bb_load_env"

That will set the proper path(s). Take a quick peek to load_env, it is a very small script.

Using bio.brew

Installing a new recipe

Here we use bio.brew to install samtools. The -j8 is passed to make when compiling

bb list
# produces a list of possible recipes to install with 
# the versions of the applications that each recipe is 
# made for (the seed of the recipe).
bb install samtools
# output produced by download and install of samtools
# if there is an errror, look at log files in $BB_PATH/log/samtools
bb activate samtools
# this will create sym links of the newly created binaries 
# from the samtools staging directory to $BB_INSTALL/bin
# Having this as a separate step allows you to download an update
# and test it without it replacing your current version of a tool.

As you can see, it is now installed in bio.brew’s local directory

$ ls -acl $BB_INSTALL/bin

Using Tools

Once installed the binaries will be in your path so you can just call them.

$ which samtools
# should output path $BB_INSTALL/bin/samtools

$ samtools view [options]

Keep in mind some recipes (packages) do not generate binary files, for those cases, bb creates special scripts (log/*.sh) and the necessary ENV vars are created. For example, picard is a bunch of jars that cannot be executed directly. By loading the recipe’s enviroment you will have a ENV variable called PICARD that points to the jars. It is then very easy to use the jars as necessary.

For example, once picard is installed and activated, assuming you have sourced the bb_load_env script in your .bashrc as described above, you could use picard jar files without knowing where exactly picard is installed:

$ ls $PICARD
# outputs all jars in picard directory.

$ java -Xmx2g -jar $PICARD/BamToBfq.jar [options]

Its just that easy!

Testing

untested

At that point you can use ./bin/bb to list, install or remove recipes. You can also run the ``tests/do_all.sh`` script to install all the recipes. That may take some time (20 minutes in a fairly powerful machine with 8 cores and using bb -j8 install).

TODO

Allow specific options for actions (-j should not be generic, only for install).
more recipes!
better support for moving between seeds of a recipe
better support for removing recipes accurately and cleanly

yannickwurm/bio.brew