TensorFlow implementation of a protein secondary structure prediction network. In essence this is a variable sequence length prediction so no prior knowledge of bioinformatics is needed to understand the code.
I don't think I am allowed to share the data but if you want to try out the code, you can download the cullpdb and cb513
dataset from this link, but you do need to modify the utils.py
to
load the data.
- Python 2.7
- TensorFlow 1.3 (>=1.0 should be fine for most part except the tf.argmax() function and RNNMultiCell())
- feat66.ipynb - initial and most experiments comes here. Everything is working except the last cell. CASP11 datase is not properly loaded yet.
- main.py - Run this to train the model.
- model.py - Restructured code for reusability and readability.
- utils.py - Preprocess the data.
- Continue to improve the code structure.
- Correctly incorporate batch normalization.
- Enable command-line parsing for hyperparameter search.
- Continue to improve the model.
- Initially I heavily referenced on vyraun's code.
- Danijar Hafner's post is super helpful in every aspect.