This is the package of Yuanfang's winning algorithm in the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge
background: ENCODE-DREAM
see also: Yuanfang Guan's 1st Place Solution and Original Code
Please contact (gyuanfan@umich.edu or hyangl@umich.edu) if you have any questions or suggestions.
Git clone a copy of ANCHOR:
git clone https://github.com/GuanLab/Anchor.git
- perl (5.10.1)
- python (3.6.5)
- numpy (1.13.3) It comes pre-packaged in Anaconda.
- opencv (3.4.2)
- samtools (1.1)
- bigWigToBedGraph
- xgboost
Genomic coordinates
- ./data/ref/ genomic coordinates under consideration (e.g. test_regions.blacklistfiltered.bed)
DNase-seq data (e.g. H1-hESC)
- ./data/dnase_aln/ read alignemnt BAM file (one or multiple replicates)
- ./data/dnase_fold_coverage/ fold-enrichement signal coverage tracks Bigwig file
DNA sequence and motif
- ./data/hg_genome/ human genome sequence
- ./data/motif/ TF motifs (e.g. motif)
Gencode
- ./data/ref/gencode.v19.annotation.gtf
- It can be downloaded from this here:
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
Once the required input files are put in the correpsonding directories, ANCHOR is ready to go:
python ANCHOR.py -tf TAF1 -cell H1-hESC
The prediction results are saved here:
./prediction/anchor/final/
This is the one-line-to-run version. The implemetation details of feature generation and binding site prediction (with step-by-step explanation and example code) can be found here: DETAILS