/pima-docker

Seperating the Docker Installation and codebase from the core pima codebase. This way the latest version is always created with the latest docker

Primary LanguageShell

Docker container for PIMA

This repository has the following structure:

  • Interfaces (holds the interfaces for interacting with the docker)
  • Internal (holds the scripts and reference files used for docker image building)
  • Installation Script (The installation script used to install and build the docker image)

Table of Contents

#Installation and preparation of PIMA docker

Prerequisites

In order to install this software you must have administrator permissions. These permissions are required to install the needed GPU drivers and the Docker daemon. If the software is already installed skip to the operations section.

PiMA relies on GPU acceleration and parallezation for parts of its pipeline. Therefore a graphics card with a CUDA Compute Capability of >=6.0. Handy Reference linking GPUs to Compatibility

The Docker environment and associated files require at least 100gb to build correctly and execute. It is recommended that more than 200gb be available to the host machine for.

Installation

Download the installation scripts to your system

wget https://raw.githubusercontent.com/appliedbinf/pima-docker/main/InstallScript.sh

To download the docker image with kraken databases loaded: (estimated size 70gb)

sudo bash InstallScript.sh -k

To download the docker image without: (estimated size 8gb)

sudo bash InstallScript.sh

This process will install the drivers, the docker packages. It will take a while and requires elevated permissions

Add Docker Group

Though the installation script attempts to configure the docker group, you may need to run the following to interact the docker outside of root

sudo groupadd docker
sudo usermod -aG docker $USER

Close your shell and reopen it so that changes may take effect and verify that you may execute docker commands

docker run hello-world

The full documentation for this process is here

Testing installation

In order to test if the installation was successful, run the following from the interactive docker shell.

sudo run -it appliedbioinformaticslab/pima-docker:kraken

You should see the --help output for PiMA.

Using the docker

There are two ways of interacting with the Docker:

  1. Through the included python interface
  2. Directly calling it

Python Interface

After running that install there should be two files created in the installation directory.

pima_interface.py <- This is the python interface script
Preloaded.json <- This json file denotes reference files that are preloaded in the docker

The python Interface manages calling the docker and handling standard arguments.

Note: To run this script, you may require elevated permissions depending on how docker was installed

Quickstart

A typical fastq run can be executed with given reference & mutation files

python pima_inteface --reference_genome <relative path to reference file> --mutation <relative path to mutations file> \
--Fastq <relative path to fastq files directory> --output <relative path to desired output directory>

Alternatively, one may forego providing a reference genome and mutation file, and use one of the defaults included in Preloaded.json

python pima_inteface --Preloded_Reference <Desired Organism>\
--Fastq <relative path to fastq files directory> --output <relative path to desired output directory>

All available Arguments

The full description of each commandline option is provided below.

usage: pima_interface.py [-h] (-f FAST5 | -q FASTQ) [-r REFERENCE_GENOME]
                         [-m MUTATION]
                         [-R {bacillus_anthracis,bacillus_anthracis_STERNE,burkholderia_psuedomallei,francisella_tularensis,francisella_tulare
nsis_LVS,klebsiella_pneumoniae,yersinia_pestis,yersinia_pestis_KIM10+,yersinia_pestis_KIM5}]
                         -o OUTPUT

Pima docker python interface

optional arguments:
  -h, --help            show this help message and exit
  -f FAST5, --Fast5 FAST5
                        Path to the Directory Containing Fast5 Files
  -q FASTQ, --Fastq FASTQ
                        Path to the Directory Containing Fastq Files
  -r REFERENCE_GENOME, --reference_genome REFERENCE_GENOME
                        Path to the Reference Genome
  -m MUTATION, --mutation MUTATION
                        Path to AMR mutation file
  -R {bacillus_anthracis,bacillus_anthracis_STERNE,burkholderia_psuedomallei,francisella_tularensis,francisella_tularensis_LVS,klebsiella_pneu
moniae,yersinia_pestis,yersinia_pestis_KIM10+,yersinia_pestis_KIM5}, --Preloded_Reference {bacillus_anthracis,bacillus_anthracis_STERNE,burkho
lderia_psuedomallei,francisella_tularensis,francisella_tularensis_LVS,klebsiella_pneumoniae,yersinia_pestis,yersinia_pestis_KIM10+,yersinia_pe
stis_KIM5}
                        Select one of the preloaded Reference and Mutation
                        Options
  -o OUTPUT, --output OUTPUT
                        Path to output file

Direct Access

For finer control, one may pass parameters directly to the docker as though it were pima

The standard format for executing a docker image is as follows:

docker run -it --gpus all --mount type=bind,source=<DesiredDirectory>,target=/home/DockerDir/mountpoint/ appliedbioinformaticslab/pima-docker:kraken <any arguments to pima>

** A full treatment of how to interact with docker containers via mounting is given here **
** Note: the --gpus all flag denotes that the container may access GPUs on the host device and is required **

Examples

Consider an example scenario where you want to assemble Bacillus anthracis ont reads. If the reference file is named ref.fasta and the query fast5 files are in the folder named barcodes_folder, the mutation regions bed file is named mutation_regions.bed and the output folder you named is ont_output then your options are as follows:

Python Interface

You may either provide the reference files:

python pima_inteface --reference_genome ref.fasta --mutation mutation_regions.bed \
--Fast5 barcodes_folder/ --output ont_outpt

Or use the included reference and mutation genome files

python pima_inteface --Preloded_Reference bacillus_anthracis --Fast5 barcodes_folder/ --output ont_outpt

Direct access

The direct access command essentially appends all the flags for pima to the docker command:

docker run -it --gpus all --mount type=bind,source=<DesiredDirectory>,target=/home/DockerDir/mountpoint/ appliedbioinformaticslab/pima-docker:kraken \
--out ont_output --ont-fast5 barcodes_folder --threads 16 --overwrite --genome-size 5m \
--verb 3 --reference-genome ref.fasta --mutation-regions mutation_regions.bed