Zhaoyi Zhang, Songyang Cheng, Claudia Solis-Lemus
This repository contains the scripts for the Zhang et al, 2022 manuscript:
@article{Zhang_Cheng_Solis-Lemus_2022,
title={Towards a robust out-of-the-box neural network model for genomic data},
volume={23},
DOI={10.1186/s12859-022-04660-8},
number={1},
journal={BMC Bioinformatics},
author={Zhang, Zhaoyi and Cheng, Songyang and Solis-Lemus, Claudia},
year={2022},
pages={125}
}
Joint first authors with equal contribution (Zhang, Zhaoyi and Cheng, Songyang). Order determined randomly by scripts/order-authors.jl
.
We used publicly available data from the following manuscripts:
-
Zeng H., Edwards M.D., Gifford D. K.(2015) "Convolutional Neural Network Architectures for Predicting DNA-Protein Binding". Proceedings of Intelligent Systems for Molecular Biology (ISMB) 2016 Bioinformatics, 32(12):i121-i127. doi: 10.1093/bioinformatics/btw255. Motif data link, Paper link
-
Nguyen, N.G., Tran, V.A., Ngo, D.L., Phan, D., Lumbanraja, F.R., Faisal, M.R., Abapihi, B., Kubo, M., Satou, K. (2016) "DNA sequence classification by convolutional neural network". JBiSE 09(05), 280–286 Splice data link, Histone data link, Paper link
Python functions to download, clean and reformat the input data:
All scripts and output files corresponding to the CNN models are in the cnn
folder.
- cnn/dna_nn/model.py contains the CNN models
- Jupyter notebooks contain the reproducible steps to run the analyses on each of the datasets:
All the scripts and output files corresponding to the NLP models are in the nlp
folder. Jupyter notebooks contain the reproducible steps to run the analyses on each of the datasets.
- nlp/uci_baseline_adam256.ipynb
- nlp/histone_lstm_layer_adam.ipynb
- nlp/histone_lstm_layer_sgd.ipynb
- nlp/discovery_baseline.py
- nlp/uci_ae_adam32.ipynb
- nlp/uci_ae_adam256.ipynb
- nlp/histone_ae_adam32.ipynb
- nlp/histone_ae_adam256.ipynb
- nlp/histone_ae_adam1024.ipynb
- nlp/discovery_ae.py
All figures in the manuscript were created with the R script in plots/final-plots.Rmd
Our code is licensed under the MIT License © Solis-Lemus lab projects (2021).
Please use the GitHub issue tracker to report any issues or difficulties with the current code.