Deopen
Deopen is a hybrid deep learning based framework to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility.
Requirements
- h5py
- hickle
- Scikit-learn=0.18.2
- Theano=0.8.0
- Lasagne=0.2.dev1
- nolearn=0.6.0
Installation
Download Deopen by
git clone https://github.com/kimmo1019/Deopen
Installation has been tested in a Linux/MacOS platform with Python2.7.
Instructions
Preprocessing data for model training
python Gen_data.py <options> -pos <positive_bed_file> -neg <negative_bed_file> -out <outputfile>
Arguments:
positive_bed_file: positive samples (bed format)
e.g. chr1 9995 10995
chr3 564753 565753
chr7 565935 566935
negative_bed_file: negative samples (bed format)
e.g. chr1 121471114 121472114
chr2 26268350 26269350
chr5 100783702 100784702
outputfile: preprocessed data for model training (hkl format)
Options:
-l <int> length of sequence (default: 1000)
Run Deopen classification model
THEANO_FLAGS='device=gpu,floatX=float32' python Deopen_classification.py -in <inputfile> -out <outputfile>
Arguments:
inputfile: preprocessed data for model training (hkl format)
outputfile: prediction outcome to be saved (hkl format)
Run Deopen regression model
THEANO_FLAGS='device=gpu,floatX=float32' python Deopen_regression.py -in <inputfile> -reads <readsfile> -out <outputfile>
Arguments:
inputfile: preprocessed file containing different features (hkl format)
readsfile: reads count for each sample (hkl format)
outputfile: trained model to be saved (hkl format)
Citation
Liu Q, Xia F, Yin Q, et al. Chromatin accessibility prediction via a hybrid deep convolutional neural network[J]. Bioinformatics, 2017, 1: 7.
License
This project is licensed under the MIT License - see the LICENSE.md file for details