This is the source code for paper
``Chromatin Accessibility Prediction via Convolutional Long Short-Term Memory Networks with k-mer Embedding'', Xu Min, Wanwen Zeng, Ning Chen, Ting Chen and Rui Jiang. Accepted by ISMB/ECCB 2017.
In this work, we address the problem of predicting chromatin accessibility from merely sequence information, by proposing an innovative convolutional long short-term memory network with k-mer embedding.
Xu Min, Wanwen Zeng, Ning Chen, Ting Chen, Rui Jiang; Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding, Bioinformatics, Volume 33, Issue 14, 15 July 2017, Pages i92–i101, https://doi.org/10.1093/bioinformatics/btx234
The code is mainly written in Python (2.7) using Keras (1.1.0) with Theano backend. One can install the required modules by following instructions on website https://keras.io/#installation.
The Anaconda platform is highly recommended.
First, we generate the sequence dataset and prepare k-mer corpus for GloVe.
python ./generate_seqs.py -e 0
The k-mer length
We train k-mer embedding vectors by GloVe.
./demo.sh
We train the supervised deep learning model based on the datasets and the pre-trained k-mer vectors.
THEANO_FLAGS='device=gpu0' python lstm.py -i 0 -batchsize 3000
One can find the meaning of each parameter in the Python script. Other models, including the DeepSEA baseling model and some variant deep learning structures, are used in a similar way. Best trained models will be saved in hdf5 files.