In case of any problem, don't hesitate to contact me on karel.brinda@gmail.com.
SMBL is a library of some useful rules and Python functions which can be used in Snakemake (https://bitbucket.org/johanneskoester/snakemake/) pipelines. It makes possible to automatically install various bioinformatics programs like read mappers, read simulators, conversion tools, etc. It supports also downloading and conversion of some important references in FASTA format (e.g., human genome).
To install SMBL, you need to have Unix-like operating system (e.g., Linux, MacOS) and Python at least 3.3. Installation / upgrade can be performed using the following command.
pip3 install --upgrade smbl
If SnakeMake has not been installed, yet, it will be installed automatically with SMBL.
The current version of SMBL from git can be installed by
pip3 install --upgrade git+git://github.com/karel-brinda/smbl
To be able to download and install software automatically, SMBL requires the following programs to be present in you Unix system:
- wget or curl
- gcc 4.7+
- git
- make
To use SMBL, you have to import the smbl Python package and include a file with all rules using:
import smbl
include: smbl.include()
Then you can use all supported programs or data. When they appear as input of a rule, they will be downloaded or compiled.
All the programs are installed into ~/.smbl/bin/
and all FASTA files into ~/.smbl/fa/
.
FASTA file | Variable with its filename |
---|---|
An example small FASTA file | smbl.fasta.EXAMPLE_1 |
An example small FASTA file | smbl.fasta.EXAMPLE_2 |
An example small FASTA file | smbl.fasta.EXAMPLE_3 |
Human genome HG38 (GRCh38) | smbl.fasta.HG38 , smbl.fasta.HUMAN_GRCH38 |
Mouse genome MM10 | smbl.fasta.MOUSE_MM10 |
Chimpanzee genome PANTR04 | smbl.fasta.CHIMP_PANTRO4 |
The following example demonstrates how SMBL can be used for automatic installation of software.
Create an empty file named Snakefile
with the following content:
import smbl
include: smbl.include()
rule all:
input:
smbl.prog.DWGSIM,
smbl.prog.BWA,
smbl.fasta.EXAMPLE
params:
PREF="simulated_reads",
INDEX="bwa_index"
output:
"alignment.sam"
run:
# read simulation
shell("{input[0]} -C 1 {input[2]} {params.PREF}")
# creating BWA index of the reference sequence
shell("{input[1]} index {input[2]}")
# mapping by BWA
shell("{input[1]} mem {input[2]} {params.PREF}.bfast.fastq > alignment.sam")
Run the script.
snakemake
What happens:
- An example FASTA file is downloaded
- DwgSim and BWA are downloaded, compiled and installed
- DwgSim simulates reads from the example Fasta file
- These reads are mapped back to the reference by BWA (alignment.sam is created)