ProtRAP
Deep learning based protein relative accessibility predictor to predict the relative lipid accessibility and relative solvent accessibility of residues from a given protein sequence
Introduction
Solvent accessibility has been extensively used to characterize and predict the chemical properties of the surface residues of soluble proteins. However, there is not yet a widely accepted quantity of the same dimension for the study of lipid-accessible residues of membrane proteins. Here we propose that lipid accessibility, defined in a similar way to solvent accessibility, can be used to characterize the lipid accessible residues of membrane proteins. Details can be found here
In models.py, we provide the definition and implementation of the final model (Transformer light).
driver.py is a simple demonstration of how to process input data and process models
prot.feat, prot.fasta are example files
Quick start
We provide a prediction server to meet researchers' small batch sample prediction needs.
http://www.songlab.cn/ProtRAP/Introduction/
Requirements
- PyTorch
- NumPy
Feature generation
Our model requires One-hot encoded sequence information (20 bits), PSSM (20 bits) and predicted SS3 (3 bits) as input features.
The order of One-hot encoding is: ACDEFGHIKLMNPQRSTVWY, we provide the seq2arr function in the driver.py file
PSSM and predicted SS3 were generated by RaptorX-Property aligning against database uniclust30_2017_10. The file suffix is .feat, and we provide the load_feat function in the driver.py file
Usage
First download the weights file we provided in Releases. Then use torch.load(absolute path)
to load the model.
Our driver.py provides easier usage. It can automatically process data, download models:
python driver.py --input_path prot
It can also download ten models trained by the ten-fold cross validation, and taking the average predicted value to achieve a more stable result.
python driver.py --input_path prot --ten_average True
To predict the output for all files in a directory:
python driver.py --input_path <path_to_input_directory> --input_type dir --ten_average True
Reference
Kang, K., Wang, L., & Song, C. (2023). ProtRAP: Predicting Lipid Accessibility Together with Solvent Accessibility of Proteins in One Run. Journal of chemical information and modeling.