The MongoDB for the ANalysis of Transcription factor (TF)-binding site (TFBS) Alterations (MANTA) was originally created in 2015 to study the impact of regulatory mutations in B-cell lymphomas (Mathelier et al. 2015). This second release of the database, MANTA2, stores TFBSs predicted in the human genome by combining ChIP-seq regions from ReMap and JASPAR profiles, as well as the potential impact scores for all possible single nucleotide variants (SNVs) at these TFBSs. Specifically, it houses >48 million TFBS predictions for 225 TFs and covers ~8% of the human genome (hg38).
Availability: The MANTA2 database hosted by the Wasserman lab can be accessed via a dedicated web server, and a MongoDB dump is available for download at Zenodo.
How to cite: Fornes, O. et al. MANTA2, update of the Mongo database for the analysis of transcription factor binding site alterations. Sci. Data 5:180141 doi: 10.1038/sdata.2018.141 (2018).
The repository is organized as follows:
- The
examples
folder contains a shell script (i.e.get_VCF_example.sh
) to generate a variant file in VCF format - The
jaspar2remap
folder contains the scripts that were used to merge JASPAR TFBS predictions with ReMAP ChIP-seq peaks along with some instructions on how to use them - The
manta2
folder contains the scripts related to loading the data to the MongoDB system and the web interface of MANTA2 - The
snv_computation
folder contains the scripts used to compute the impact scores of the SNVs within the MANTA2 TFBSs and instructions on how to use them - The
validation
folder contains the scripts used for the external validation of MANTA2 on allelic imbalance data - The symbolic link to the
search_manta2.py
script provides programmatic access to MANTA2 (both locally and hosted at the Wasserman Lab)
MANTA2 requires the following dependencies:
MongoDB
(≥2.4; version 3.4 or higher is strongly recommended)Python (>=2.7)
with theFlask
andPyMongo
libraries
To create a local build of MANTA2, follow the next steps:
- Install (if necessary) and initialize the MongoDB system: instructions for the main OS can be found here
- Download, uncompress and unpack the MANTA2 MongoDB dump: this creates a folder (i.e.
manta2_mongodb_dump
) containing 5 different files:experiments.bson
,experiments.metadata.json
,system.indexes.bson
,tfbs_snvs.bson
, andtfbs_snvs.metadata.json
- Restore MANTA2 from the MongoDB dump:
mongorestore -d $DB_NAME $PATH_TO_MONGODB_DUMP
, where$DB_NAME
specifies the database name for MANTA2 in the MongoDB system (e.g.manta2
) and$PATH_TO_MONGODB_DUMP
the path to the foldermanta2_mongodb_dump
from the previous step - Create a user with
read
privileges to the MANTA2 database:mongo --eval "db.getSiblingDB('$DB_NAME').createUser({user: '$USER', pwd: '$PASSWORD', roles: [{role: 'read', db: '$DB_NAME'}]})"
Note that, in the commands from the previous steps, the strings $DB_NAME
, $PATH_TO_MONGODB_DUMP
, $USER
, $PASSWORD
should be replaced with your own.
To check whether MANTA2 was restored successfully, type mongo
on a terminal to open a MongoDB shell and then type show databases
. The system should list all available Mongo databases, including MANTA2 (e.g. manta2 14.616GB
).
The script search_manta2.py
provides programmatic access to MANTA2. It requires the following inputs:
- The name of the MANTA2 database in the MongoDB system (option
-d
) - The name of the server where the MongoDB system is hosted (option
-H
) - A user with
read
privileges to the MANTA2 database (option-u
) - The password for the previous user (option
-p
) - A file containing a list of variants in VCF, BED or GFF format (option
-i
)
Non-mandatory options include:
- The format of the input variant file (option
-t
; by default the script tries to identify the input format automatically) - The name of a file to output the results (option
-o
; by default is set to the standard output stream (stdout))
As a usage example, the MANTA2 database hosted at the Wasserman Lab can be accessed as follows: ./search_manta2.py -d manta2 -H manta.cmmt.ubc.ca -u manta_r -p mantapw -i <variant file>
.
A variant file can be obtained by executing the shell script get_VCF_example.sh
located in the ./examples/
folder. The resulting VCF file (i.e. chr20.vcf
) contains high-confidence SNP, small indel, and homozygous reference calls on chromosome 20 from the Genome in a Bottle (version 3.3.2) sample HG001 (Zook et al. 2014).
The search_manta2.py
script returns all TFBS predictions potentially impacted by these variants as tab-separated values. For each TFBS alteration, the script provides the variant information along with the associated wild-type (reference) and mutated (alternative) TFBS information, including:
- the chromosome and position of the variant;
- the reference and alternative alleles at that genomic location;
- the mutation ID (if the input file format allowed for it, otherwise the field is displayed as
.
); - the TF name and associated JASPAR profile ID;
- the start, end and strand, as well as the absolute (raw) and relative scores for both the reference and alternative TFBSs; and
- the impact score.
Users planning on performing large numbers of searches should create their local builds of the MANTA2 database (see Installation).
The MANTA2 database hosted at the Wasserman Lab can also be accessed via a dedicated web server at URL. Similar to the search_manta2.py
script, the server requires as input a list of variants in VCF, BED or GFF format, and it returns all TFBS predictions potentially impacted by these variants as a tab-separated values table. The table can be sorted on any column by clicking on the column header.