Implementation of the miRBooking algorithm and metrics in C
- fast and memory efficient
- usable from Python, JavaScript and Vala via GObject introspection
- memory-mapped score tables, target and miRNAs FASTA for low memory footprint in parallel execution
- binary with support for static linking for more portability
- stdin/stdout for piping from and into other tools
mirbooking --targets targets.fa
--mirnas mirnas.fa
--seed-scores scores-7mer-3mismatch-ending
[--accessibility-scores accessibility-scores[.gz]]
[--supplementary-model none]
[--supplementary-scores scores-3mer]
[--input stdin]
[--output stdout]
[--output-format tsv]
[--sparse-solver best-available]
[--max-iterations 100]
[--5prime-footprint 9]
[--3prime-footprint 7]
[--cutoff 100]
[--relative-cutoff 0]
[--blacklist blacklist.tsv]
To obtain detailed usage and options, launch mirbooking --help
.
The command line program expects a number of inputs:
--targets
, a FASTA containing RNA transcripts where the identifier is the accession with support for alternative flavours from NCBI RefSeq and GenBank via--ncbi-targets
, and GENCODE via--gencode-targets
--mirnas
, a FASTA containing mature miRNAs where the identifier is the accession with support for alternative flavour from miRBase via--mirbase-mirnas
--seed-scores
, a sparse score table of seed free energies which can be generated usinggenerate-score-table
program described below--accessibilitiy-scores
contains entries with position-wise free energy contribution (or penalty) on the targets--supplementary-scores
contains either 4mer or 3mer--input
, a quantity file mapping target and miRNA accessions to expressed quantity in picomolars units
Tables for seed and supplementary scores are provided in the data
folder.
These were computed with RNAcofold binding energy from ViennaRNA package.
Note that Yan et al. (--supplementary-model=yan-et-al-2018
) model requires a
3mer table whereas Zamore et al. (--supplementary-model=zamore-et-al-2012
)
require a 4mer table.
Tables for seed and supplementary bindings are automatically located (new in 2.3).
The --cutoff
parameter can exploit a known upper bound on the complex
concentration to adjust the granularity of the model. Only interaction that can
ideally reach the specified picomolar concentration will be modeled.
The --relative-cutoff
parameter is similar, but instead filter based on the
ideal substrate bound fraction.
The output is a TSV with the following columns:
Column | Description |
---|---|
gene_accession | Gene accession with version (new in 2.3) |
gene_name | Name of the gene or N/A if unknown (new in 2.3) |
target_accession | Target accession with version |
target_name | Name of the target or N/A if unknown |
target_quantity | Total target concentration in picomolars |
position | Site position on the target |
mirna_accession | miRNA accession |
mirna_name | Name of the miRNA or N/A if unknown |
mirna_quantity | Total miRNA concentration in picomolars |
score | Michalis-Menten constant of the miRNA::MRE duplex |
quantity | miRNA::MRE duplex concentration this target position in picomolars |
The detailed TSV output which expands the score structure in its constituents
can be used with --output-format=tsv-detailed
(new in 2.3). In this mode, the
score
column is replaced by kf
, kr
, kcleave
, krelease
, kcat
,
kother
, kd
and km
.
The GFF3 output can be used with --output-format=gff3
. The score will
indicate the bound fraction of the position.
Wiggle output can also be produced with --output-format=wig
. The score will
be the position-wise bound fraction of substrate which properly account for
overlapping microRNA.
The --blacklist
parameter indicates a file that contains interactions that
the model should ignore. This is particularly useful if you know beforehand
they will be too weak at equilibrium to be worth modeling. The format is a
three column TSV containing only the columns target_accession
, position
and
mirna_accession
from the output format.
You'll need Meson and Ninja as well as GLib development files installed on your system.
mkdir build && cd build
meson --buildtype=release
ninja
ninja install
To generate fast code, configure with meson -Doptimization=3
.
You can perform a local installation using meson --prefix=$HOME/.local
, but
you'll need LD_LIBRARY_PATH
set accordingly since the mirbooking
program
uses a shared library. Otherwise, a static linkage can be done by calling
meson --default-library=static
.
To generate introspection metadata, use meson -Dwith_introspection=true
. To
generate Vala bindings, use meson -Dwith_vapi=true
.
CBLAS is required and you can alternatively opt for ATLAS with
-Dwith_atlas=true
or OpenBLAS -Dwith_openblas=true
implementations instead
of the default netlib CBLAS. If configured with -Dwith_mkl=true
, MKL CBLAS
will be used instead. The OpenMP flavour of OpenBLAS is used when configured
with -Dwith_openmp=true
.
FFTW can be optionally used to compute more accurate silencing by specifying
meson -Dwith_fftw3=true
. If you redistribute miRBooking source code, be
careful not to enable this as a default because of the GPL license covering
this dependency. If you have access to Intel MKL, you can alternatively use its
FFTW3 implementation with -Dwith_mkl_fftw3=true
.
OpenMP can be optionally used to parallelize the evaluation of partial
derivatives and some supported solvers by specifying -Dwith_openmp=true
.
MPI can be optionally used to distribute the computation across multiple
machine on supported solvers (i.e. mkl-cluster
) by specifying -Dwith_mpi=true
.
Solver | Build Options |
---|---|
LAPACK | No option since this is the fallback solver. |
SuperLU | -Dwith_superlu=true |
SuperLU MT | -Dwith_superlu_mt=true |
UMFPACK | -Dwith_umfpack=true |
cuSOLVER | -Dwith_cuda=<cuda_toolkit_api_version> -Dwith_cusolver=true |
MKL DSS | -Dwith_mkl=true -Dmkl_root=<path to mkl> -Dwith_mkl_dss=true |
MKL Cluster | -Dwith_mpi=true -Dwith_mkl=true -Dmkl_root=<path to mkl> -Dwith_mkl_cluster=true |
MKL LAPACK | -Dwith_mkl=true -Dmkl_root=<path to mkl> -Dwith_mkl_lapack=true |
PARDISO | -Dwith_pardiso=true |
LAPACK is not a sparse linear solver and thus will not handle typical workload very well, but it will perform orders of magnitude faster on dense jacobians.
cuSOLVER require CUDA toolkit whose API version is to be specified with
-Dwith_cuda=<cuda_toolkit_api_version>
.
MKL DSS and MKL Cluster can benefit from TBB
instead of OpenMP, which can be enabled with -Dwith_mkl_tbb=true
.
MKL DSS and MKL Cluster can be used with the 64 bit interface, allowing much
larger systems to be solved with -Dwith_mkl_ilp64=true
. However, this will
break other solvers as it will load a 64 bit BLAS.
PARDISO cannot be used along with MKL DSS because they define common symbols.
By default, the best sparse solver available among the following will be used (new in 2.3):
- MKL-DSS
- PARDISO
- UMFPACK
- SuperLU
- LAPACK
In addition to determine the steady state, miRBooking can also perform numerical integration of the microtargetome using the programming API.
In addition to the mirbooking
binary, this package ship a number of
utilities.
Te generate-score-table
compute a hybridization energy table for a given seed
mask. Either ViennaRNA or
mcff is required to compute energies.
generate-score-table [--method=RNAcofold]
[--temperature=310.5]
[--mask=||||...]
[--hard-mask=||||...]
--output scores
The seed mask defines folding constraints on the target with |
for
a canonical match, x
for a canonical mismatch and .
for no constraint. It
also determines the seed length. If a hard mask is provided, unsatisfying
interactions are filtered out (new in 2.3).
It's also possible to ajust the folding temperature (new in 2.3).
The number of workers can be tuned by setting OMP_NUM_THREADS
environment
variable.
The mirbooking-iterative
tool is a wrapper script around miRBooking which
takes advantage of the --blacklist
flag by solving the equilibrium gradually
and excluding weak interactions in subsequent models.
It takes the same arguments as mirbooking
with the slight distinction that
the --cutoff
now indicates the target cutoff.
The API is conform to the GLib style and enable a wide range of use. It is fairly easy to use and a typical experimentation session is:
- create a broker via
mirbooking_broker_new
- create some sequence objects with
mirbooking_target_new
andmirbooking_mirna_new
- setup quantities via
mirbooking_broker_set_sequence_quantity
- call
mirbooking_broker_evaluate
andmirbooking_broker_step
repeatedly to perform a full hybridization or numerical integration - retrieve and inspect the microtargetome with
mirbooking_broker_get_target_sites
For a more detailed usage and code example, the main program source in
bin/mirbooking.c
is very explicit as it perform a full session and fully
output the target sites.
Poirier-Morency, G. Modélisation des réseaux de régulation de l’expression des gènes par les microARN. (Université de Montréal, 2021). https://doi.org/1866/25104