Leopard: A Python repository from yynst2

Leopard: fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution

Leopard is a deep learning approach to predict cell type-specific in vivo transcription factor binding sites with high accuracy, speed and resolution Hongyang Li, Yuanfang Guan - bioRxiv, 2019, doi: https://doi.org/10.1101/856823. Please contact (hyangl@umich.edu or gyuanfan@umich.edu) if you have any questions or suggestions.

Installation

Git clone a copy of code:

git clone https://github.com/GuanLab/Leopard.git

Required dependencies

python (3.6.5)
numpy (1.13.3). It comes pre-packaged in Anaconda.
pyBigWig A package for quick access to and create of bigwig files. It can be installed by:

conda install pybigwig -c bioconda

tensorflow (1.14.0) A popular deep learning package. It can be installed by:

conda install tensorflow-gpu

keras (2.2.5) A popular deep learning package using tensorflow backend. It can be installed by:

conda install keras

Dataset

The data in bigwig format can be directly downloaded from our web server:

Before running Leopard, please download the above data (30GB) and deposit them in the "Leopard/data/" folder. The DNA sequence bigwig files are always needed. If you only need to make predictions on one cell type, you only need to download the "avg.bigwig" and the correpsonding DNase-seq file for this specific cell type. The ChIP-seq data are optional. You only need them if you want to re-train/adapt our models or compare predictions with experimental observations.

The original data can be found as follows:

The DNase-seq data were downloaded from the ENCODE-DREAM challenge website: filtered alignment

The ChIP-seq data were downloaded from the ENCODE-DREAM challenge website: conservative peaks and fold enrichment and the ENCODE project(The accession numbers are provided in Supplementary Table 4.)

Run Leopard predictions

Once the required input files are put in the correpsonding directories, Leopard is ready to go (fast mode):

python Leopard.py -tf E2F1 -te K562 -chr chr21 chr22

Or you can run the complete mode with higher accuracy and longer runtime:

python Leopard.py -tf E2F1 -te K562 -chr chr21 -m complete

The prediction npy files are saved in the ./output/ folder

yynst2/Leopard

Leopard: fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution

Installation

Required dependencies

Dataset

Run Leopard predictions