TREDNet is now published at PNAS: https://www.pnas.org/doi/10.1073/pnas.2206612120
TREDNet is a modular deep neural network architecture which allows building models to predict regulatory regions. It comprises two phases:
This phase is similar to the standard in the field of genomics, multitask classification setup based convolutional neural networks (CNN) such as DeepSEA, Beluga, DanQ etc. It predicts 1924 genomic features of transcription factors, histone modifications and DNase-I Hypersensitivity Sites in 123 different tissues and cell-lines.
This phase is a smaller CNN, which is trained on enhancer regions. Let's say we have a set of ehnancers defined as regions of chromatin accessibility which overlap H3K27ac and H3K4me1 histone modification regions. Then you train the phase two model on these enhancer regions, however each region is first passed through the first phase and converted to a vector of length 1924.
The model can be built as a single output classifier (enhancers of a single tissue) or multitask classifier (enhancers of multiple tissues)
TREDNet is developed using Python and uses keras/tensorflow for training DL models.
It requires a CUDA-capable linux computer with a GPU node and >150GB memory for training the models from scratch. For running precidtions having a GPU machine is recommended but not required.
Data preparation step requires:
- GRCh37/hg19 assembly of the human genome. The fasta file can be downloaded from:
There are two ways to run predictions on input sequences:
TREDNet models are deposited in the Kipoi database: http://kipoi.org/models/TREDNet/
Easiest way is to run the model through Kipoi interface. To do so, install the kipoi
package first:
pip install kipoi
Then use the -use-kipoi
flag when running the score_variant.py
command. When you run the command for the first time, Kipoi package will download the models of two phases.
If Kipoi proves to be too much of an overhead, run: bash data_download.sh
script to download the models. The models will be downloaded to ./data
directory.
Dependencies of TREDNet can be installed with pip as follows.
pip install requirements.txt
Then run the score_variant.py
command with -use-kipoi
flag omitted, since by default it is set to False
.
./score_variant.py [options]
Options:
-vcf-file
: "VCF file containing variants for calculating the mutational scores."-phase-one-file
: "Phase one model weights (hdf5) file. Skip if using-use-kipoi
flag"-phase-two-name
: "Phase two model's cell name. Allowed options: islet, HepG2, K562"-save-file
: "File to save the scores"-hg19-fasta
: "Fasta file to hg19 assembly"-score-delta
: "Generate delta scores"-score-iep
: "Generate IEP scores"-use-kipoi
: "Use Kipoi models for running the models"
There are two ways to run predictions on input sequences:
PAS/DAS detection requires pre-generated delta scores. Input file required in h5 format.
Then run the peak_detection/annotate.py
peak_detection/annotate.py delta_file model_dir save_file
Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits
Sanjarbek Hudaiberdiev, D. Leland Taylor, Wei Song, Narisu Narisu, Redwan M. Bhuiyan, Henry J. Taylor, Xuming Tang, Tingfen Yan, Amy J. Swift, Lori L. Bonnycastle, DIAMANTE Consortium, Shuibing Chen, Michael L. Stitzel, Michael R. Erdos, Ivan Ovcharenko, and Francis S. Collins
Proceedings of National Academy of Science (PNAS) 2023 120 (35) e2206612120. DOI: 10.1073/pnas.2206612120
Open an issue on this GitHub repo or email to the first author of the publication.