Prediction of protein-protein interactions using sequences of intrinsically disordered regions
We provide:
This prediction model is designed to analyze protein pairs that share a common protein within our model as described in the paper. If your protein pairs do not have a common protein in our model, the model will not produce a valid output
Depending on the pair type, this model will utilize either a symmetric or asymmetric model. If you're uncertain about which model to employ for testing the interactions, you can use this model.
This model is designed to identify potential interaction partners for a given protein. Unlike other prediction models that require pairs of proteins as input, this model only requires the user to input a single protein of interest. The model analyzes the intrinsically disordered regions (IDRs) of the provided protein and uses this information to generate a list of potential interaction partners.
- Dependency
-
R
-
R libraries: protR
-
python 3.9
-
python libraries: sklearn, matplotlib, pickle, pandas, csv, sys, click
You can create the enviroment using enviroment.yml file
-
Clone this repository and cd into it.
git clone https://github.com/gozdekibar/IDR_PPI_prediction.git cd ./IDR_PPI_prediction
Here is the example of the prediction of the interactions of retinoblastoma protein as shown in the paper using asymmetric model
Before running the code, you will need to generate the input features matrix from IDR sequences of the input proteins by running the following R code.
- Input: IDR sequences as a fasta file. IDR sequences should be longer than 15 amino acids. (example input: test_input_RB_IDRs.fa in sequences folder)
Run:
Rscript ./R/extractFeatures_protR.R ./sequences/test_input_RB_IDRs.fa ./features/output_RB1_protR.txt
This command will take the input fasta file located at ./sequences/test_input_RB_IDRs.fa and preprocess it for use with our model. The preprocessed data will be output to a file located at ./features/output_RB1_protR.txt.
Once the input feature matrix has been generated, you can run the asymmetric model using the following command:
-
--input : features calculated from the preprocessing step
-
--pairs : Candidate pairs in tab separated format. Protein names should be Uniprot IDs.
-
--output : output file path for the predictions
Run:
python3 asymmetricModel.py --input ./features/output_RB1_protR.txt --pairs ./exampleDataRB1/RB_test_interactions.txt --output ./exampleDataRB1/RB1_predictions_out.txt
This command will use the asymmetric model to test the interactions for the provided protein pairs. The predicted interactions will be output to a file located at exampleDataRB1/RB1_predictions_out.txt
This command will use our unified model to test the interactions
-
--input : features calculated from the preprocessing step
-
--pairs : Candidate pairs in tab separated format. Protein names should be Uniprot IDs.
-
--output : output file path for the predictions
Run:
python3 unifiedModel.py --input ./features/output_RB1_protR.txt --pairs ./exampleDataRB1/RB_test_interactions_unified.txt --output ./exampleDataRB1/RB1_predictions_unified_out.txt
This command will use our target prediction model to test the interaction partners for the input protein.
-
--input : features calculated from the preprocessing step including the input protein
-
--target: Uniprot IDs of the input protein
-
--output : output file path for the predictions
Run:
python3 targetModel.py --input ./features/output_RB1_protR.txt --target ./exampleDataRB1/target_file.txt --output ./exampleDataRB1/RB1_predictions_target_out.txt