This is a fork from openprotein, A PyTorch framework for tertiary protein structure prediction.
- CASP7 dataset from ProteinNet is added.
- A deep residual CNN architecture is added
you can use the following commands:
__main__.py --use-gpu --evaluate-on-test --experiment-id deepprotein --minibatch-size 6 --learning-rate 0.001 --min-updates 14000
To run this project, simply git clone the repository, install dependencies using pipenv install
and then type pipenv run python __main__.py
in the terminal to run the sample experiment:
$ pipenv run python __main__.py
------------------------
--- OpenProtein v0.1 ---
------------------------
Live plot deactivated, see output folder for plot.
Starting pre-processing of raw data...
Preprocessed file for testing.txt already exists.
force_pre_processing_overwrite flag set to True, overwriting old file...
Processing raw data file testing.txt
Wrote output to 81 proteins to data/preprocessed/testing.txt.hdf5
Completed pre-processing.
2018-09-27 19:27:34: Train loss: -781787.696391812
2018-09-27 19:27:35: Loss time: 1.8300042152404785 Grad time: 0.5147676467895508
...
See models.py
for examples of how to create your own model.
See prediction.py
for examples of how to use pre-trained models.
OpenProtein includes a preprocessing tool (preprocessing.py
) which will transform the standard ProteinNet format into a hdf5 file and save it in data/preprocessed/
. This is done in a memory-efficient way (line-by-line).
The OpenProtein PyTorch data loader is memory optimized too - when reading the hdf5 file it will only load the samples needed for each minibatch into memory.
Please see the LICENSE file in the root directory.