/enhancer_promoter_manuscript

supplementary materials for "Sequence characteristics distinguish transcribed enhancers from promoters and predict their breadth of activity"

Primary LanguagePython

Guide to scripts and data for "Sequence characteristics distinguish transcribed enhancers from promoters and predict their breadth of activity"

Laura Colbran

01/15/2018

doi:10.1534/genetics.118.301895

bin/

avg_curves.py
    averages ROC and PR curves from many classifiers, used when we split larger sets up

enh-prom_analyses.ipynb
    contains R code for relative ROC calculation (Fig. 3), TF motif analyses (Fig. 5), PCA, kmer weights (Fig 4)

kmer_count.py
    counts occurrence of all sequences of length k in a set of genomic regions

set_length.py
    makes every region in a bed file the same length, keeping same center point

data/

all_fantom_enhancers.bed
    Broad enhancers = all with #tiss >45
    Context Specific = random subset of those with #tiss = 1
    regions were set to 600bp before use

all_fantom_prom.bed
    Broad Promoters = random subset of those with mean_act >372
    Context-Specific = all with mean_act <9
    regions were set to 600bp before use

roadmap_enhancers_600bp.bed
    filtered, set to 600bp

prom_enh_rel_ROC.txt
    values for Fig. 3 relative ROCs

roadmap_promoters_600bp.bed
    filtered, set to 600bp

tf_motif_specificity.csv
    FANTOM TSPS scores, IDs

classifiers/

output and scripts from all SVM classifiers
N.B. classifier script requires Python 2.7.8 and Shogun Machine Learning Toolbox v4.0.0

fantom_enhVSprom/
    direct classifiers between enhancers and promoters (Fig. 1)

fantom_enhVsprom_cgiMatched/
    direct classifiers between enhancers and promoters, stratified by CGI overlap

broadVSspecific/
    classifiers between broad and specific regions (Fig. 2)

cgi_analyses/
    stratified by CGI status (Fig. 3)

roadmap_enhVSprom/
    direct classifiers between enhancers and promoters (Fig. 6)

enhVsprom_tf_matching/

tomtom output for top 6-mers in direct classifiers between enhancers and promoters (Fig 3B)

motif_sim/

tomtom output for top 6-mer in other enhancer and promoter classifiers
    hocomoco/ (Fig 5)
    jaspar/ (Figs S11 & S12)\

tf_counts/

overall broad and narrow tf counts in regions (Fig. 5)