/PSPHunter

A Machine Learning Model to Predict Phase Separation Driving Residues

Primary LanguagePerlGNU General Public License v3.0GPL-3.0

PSPHunter: A Machine Learning Model to Predict Phase Separation Driving Residues

Running the programme

Start the installation of the dependencies using

git clone git@github.com:jsun9003/PSPHunter.git
cd PSPhunter
# install pixi
curl -fsSL https://pixi.sh/install.sh | bash
# setup the pixi environment
pixi install

then start the prediction on your fasta

pixi run predict -i fasta.fa

About

Dissecting the functions and the regulatory mechanisms of intracellular phase separation is fundamental to understanding transcriptional control, cell fate transition and disease development. However, the driving residues, which impact phase separation the most and therefore is the key for the functional study of protein phase separation, remain largely undisclosed. We developed PSPHunter, a machine learning method for predicting driving residues in phase-separating proteins. Validation through in vivo and in vitro methods, including FRAP and saturation measurements, confirms PSPHunter's accuracy. Applying PSPHunter, we demonstrate that truncating just 6 driving residues in SOX2 and GATA3 significantly disrupts their phase separation properties. Furthermore, PSPHunter identified nearly 80% of the phase-separating proteins associated with diseases. Remarkably, frequently mutated pathological residues (glycine and proline) tend to localize within driving residues, exerting a significant influence on phase separation. PSPHunter thus emerges as a crucial tool to uncover driving residues, facilitating insights into phase separation mechanisms governing transcriptional control, cell fate transitions, and disease development.
--------------------------

figs/overview.jpg

To generate the ensential features

Please have the following softwares installed first:

Additional Tutorial

Datasets for Training

  • Phase separation proteins used to construct PSPHunter are in the ./datasets folder.
  • Trained models, including Sequence-based model, word2vec-based model, and Merged Model, are stored in the ./train/ directory.

Generate the features

  • Code for generating all features is located in scripts/featureExtraction, encompassing both sequence and functional features. The merged output can be used for model training.

We will demonstrate the usage of PSPHunter using its word2vec sub-model (The complete model is stored in the 'trained model' folder.)

Demonstration of phase separation Probability Prediction

cd Test

perl ../scripts/Standalone/predict_proteinProb.pl -i seq.fasta

Demonstration of phase separation driving residue Prediction

cd Test

perl ../scripts/Standalone/predict_DrivingRegion.pl -i seq.fasta -o outfile

Demonstration of phase separation mutation effect Prediction

cd Test

perl ../scripts/Standalone/predict_MutationEffect.pl -i seq.fasta -o outfile

Availability

We have developed a user-friendly website, accessible at http://psphunter.stemcellding.org/, to facilitate the use of PSPHunter. This platform enables the prediction of phase-separating proteins and their driving regions using only protein sequences as input. By offering the capability to assess the impact of mutations on phase separation, our users can promptly identify mutations that disrupt normal phase separation functions.

Cite

Cite our paper by

Sun, J., Qu, J., Zhao, C. et al. Precise prediction of phase-separation key residues by machine learning. Nature Communications 15, 2662 (2024). https://doi.org/10.1038/s41467-024-46901-9 (IF: 16.6)

figs/graphAbstract.jpg

Contact

Please contact o.sj@live.com or raise an issue in the github repo with any questions about installation or usage.