Anchor: Trans-cell Type Prediction of Transcription Factor Binding Sites

This is the package of Yuanfang's winning algorithm in the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge

background: ENCODE-DREAM

see also: Yuanfang Guan's 1st Place Solution and Original Code

Please contact (gyuanfan@umich.edu or hyangl@umich.edu) if you have any questions or suggestions.


Installation

Git clone a copy of ANCHOR:

git clone https://github.com/GuanLab/Anchor.git

Required dependencies

Required input files (and the corresponding directories to put them)

Genomic coordinates

DNase-seq data (e.g. H1-hESC)

  • ./data/dnase_aln/ read alignemnt BAM file (one or multiple replicates)
  • ./data/dnase_fold_coverage/ fold-enrichement signal coverage tracks Bigwig file

DNA sequence and motif

  • ./data/hg_genome/ human genome sequence
  • ./data/motif/ TF motifs (e.g. motif)

Gencode

ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

Prepare features and make predictions

Once the required input files are put in the correpsonding directories, ANCHOR is ready to go:

python ANCHOR.py -tf TAF1 -cell H1-hESC 

The prediction results are saved here:

./prediction/anchor/final/

This is the one-line-to-run version. The implemetation details of feature generation and binding site prediction (with step-by-step explanation and example code) can be found here: DETAILS