Ocelot: A Python repository from Hongyang449

Ocelot: Improved Epigenome Imputation Reveals Asymmetric Predictive Relationships Across Histone Modifications

Ocelot is a machine learning approach to impute epigenomes across tissues and cell types. It ranked first in the ENCODE Imputation Challenge with high accuracy on held-out prospective data. Beyond high predictive performance, it offers a new way to investigate the cross-histone regulations based on large-scale epigenomics datasets. Please contact (hyangl@umich.edu or gyuanfan@umich.edu) if you have any questions or suggestions.

Installation

Git clone a copy of code:

git clone https://github.com/GuanLab/Ocelot.git

Required dependencies

python (3.6.5)
numpy (1.13.3). It comes pre-packaged in Anaconda.
pyBigWig A package for quick access to and create of bigwig files.

conda install pybigwig -c bioconda

lightgbm(2.3.0) A gradient boosting tree-based algorithm with fast training speed and high efficienty.

pip install lightgbm

tensorflow (1.14.0) A popular deep learning package.

conda install tensorflow-gpu

keras (2.2.5) A popular deep learning package using tensorflow backend.

conda install keras

Dataset

Code of Ocelot and evaluation on the challenge data

data_challenge
code_challenge

Reproducing all these imputations and evaluations requires considerable time even with super computing resources, we therefore also provide the processed data, trained models and predictions together with the reproducible scripts.

For benchmarking, predictions from Avocado and ChromImpute are also provided:

3a. Avocado predictions
4a. ChromImpute models and predictions

Mapping between letter, id and histone mark in challenge

For simplicity, we map the epigeneic marks to captital letters as follows:

letter	id	mark
C	M02	DNase-seq
D	M18	H3K36me3
E	M17	H3K27me3
F	M16	H3K27ac
G	M20	H3K4me1
H	M22	H3K4me3
I	M29	H3K9me3
J	M01	ATAC-seq

For example, in the "CDEH_I" design, we used four marks (C, D, E, H) as cell type-specific features to predict mark I.

Data processing and model building scripts for ENCODE3 imputation

data_encode3
code_encode3

Data processing and model building scripts for Roadmap imputation

data_roadmap
code_roadmap

Code for SHAP analysis

code_shap

Code of Ocelot final submission to the ENCODE Imputation Challenge

challenge solution

Hongyang449/Ocelot