PSPire is a machine learning model based on integrated residue-level and structure-level features to predict phase-separating proteins. It is written in Python3 and is available as a command line tool.
git clone https://github.com/TongjiZhanglab/PSPire.git
-
Create conda environment:
conda create -n PSPire python=3.8 conda activate PSPire
-
Install DSSP. Conda installation is recommended:
conda install -c salilab -y dssp=2.2.1
-
Install Pymol for sticker calculation:
conda install -c conda-forge -y pymol-open-source
-
Install the following python package:
conda install -y xgboost=1.6.2 conda install -y scikit-learn=1.1.2 conda install -y biopython numpy pandas requests
-
(Optional) The published models of PSPire used PSAIA software to calculate relative solvent accessible surface area (RSA). It is recommended to prepare the running environment for PSAIA. User can first install Singularity and then use Singularity to build a container. If Singularity or the container image is not found, PSPire would use DSSP to calculate RSA. IMPORTANT NOTE: Singularity has been renamed to Apptainer. This repository is now for archiving the history in the release branches. The master branch is not in a consistent state. Submit all current issues and pull requests to https://github.com/apptainer/apptainer.
-
Install Singularity. As Singularity is written primarily in Go, you should install Go first. After installation, you can type the command below to check Singularity has been installed successfully.
singularity -h
Note: you need to use the following command to source the Singularity bash completion file to make sure the usage of bash completion in new shells.
echo ". Singularity_Installation_Path/etc/bash_completion.d/singularity" >> ~/.bashrc # you should replce Singularity_Installation_Path with your installation path
-
Build Qt 4.8.6 libraray container image needed for PSAIA software running:
cd /path/to/PSPire/data singularity -d build psaia.simg docker://msoares/qt4-dev
-
-
Add PSPire to
$PATH
. To enable global access to PSPire from any location on your system, it's recommended to add the PSPire's directory to your system's$PATH
environment variable.chmod o+x /path/to/PSPire/PSPire.py echo 'export PATH=/path/to/PSPire:$PATH' >> ~/.bashrc source ~/.bashrc
-
Pull the PSPire docker image:
docker pull houshuang2020/pspire:latest
-
Replace the
lib/psaia_run.py
andPSPire.py
files with the files under the docker_script folder. -
Make sure the PSAIA script is executable:
chmod o+x /path/to/PSPire/software/PSAIA/psa
Users can use PSPire.py -h
to get help information of PSPire.
Required parameters: the following parameters are required and you can only specify one of them.
-u UNIPROT [UNIPROT ...], --uniprot UNIPROT [UNIPROT ...]
UniProt IDs. Multiple IDs should be separated by space.
-f FILE, --file FILE List file with UniProt IDs or absolute path of protein pdb files.
Each ID or pdb file name should take one line.
-p PDBFILE, --pdbfile PDBFILE
PDB file of a protein.
-d DIRECTORY, --directory DIRECTORY
Absolute directory path of pdb files. The script will automatically
search files with pdb suffix under the specified directory.
Optional parameters:
-o OUTPUT, --output OUTPUT
Output file name. (Default: standard out)
-n NAME, --name NAME Project name. PSPire would use this name to create temporary
directory. (Default: PSPire)
--ignore When this parameter is set, PSPire would ignore intrinsically
disordered region(IDR)-related features for proteins with IDRs.
-s PHOS, --phos PHOS Absolute path of the user-defined phosphorylation (Phos) feature file.
If this parameter is specified, PSPire would use the model with the
Phos feature. User can check the demo directory of PSPire software
package for example format of the phos feature file. (Default: '')
--mobidb By default, PSPire would assume the pdb files you provide have pLDDT
score in the B-factor column calculated by AlphaFold, and use the
score to get IDRs. When this parameter is set, PSPire would get IDRs
by MobiDB-lite software.
-t THRESHOLD, --threshold THRESHOLD
Threshold of pLDDT score to get idr regions. (Default: 50)
-c CUTOFF, --cutoff CUTOFF
If the RSA percentage of a residue is greater than this cutoff, it
will be assigned as exposed surface residue, otherwise as buried
residue. (Default: 25)
-j JOBS, --jobs JOBS If mobidb parameter is on, PSPire would use the given number of cpus
to run MobiDB-lite. (Default: 10)
--resume By default, PSPire would clean up the temporary files and start from
the beginning. When resume is on, each re-run would use previous
temporary files to resume from the step it crashed.
--dont_remove By default, PSPire would clean up temporary files. When dont_remove is
on, PSPire would keep temporary files.
-
Specify uniprot ids:
PSPire.py -u P09651 PSPire.py -u P09651 O00444
-
Specify list file with UniProt IDs or absolute path of protein pdb files:
PSPire.py -f ${SOFTWAREPATH}/demo/PDB_files_list.txt PSPire.py -f ${SOFTWAREPATH}/demo/uniprotID_list.txt
-
Specify PDB file of a protein:
PSPire.py -p ${SOFTWAREPATH}/demo/AF-A0A2R8QUZ1-F1-model_v2.pdb
-
Specify absolute directory path of pdb files:
PSPire.py -d ${SOFTWAREPATH}/demo
-
Ignore intrinsically disordered region(IDR)-related features for proteins with IDRs:
PSPire.py -u P09651 --ignore
-
Specified user-defined phosphorylation feature:
PSPire.py -u P09651 -s ${SOFTWAREPATH}/demo/phos_feature_example.csv
-
Use MobiDB-lite software to calculate IDRs:
PSPire.py -u P09651 --mobidb
By default, the results will be sent to the standard output. User can use the "-o" parameter to specify output file name in CSV format. The "Score" column represents the PS propensity of the protein, indicating that a higher value corresponds to a higher likelihood of PS. The "Include_IDRs" column indicates whether the protein contains IDRs. The last three columns indicate the structured superficial regions, positive and negative sticker regions.
Uniprot_ID,Score,Include_IDRs,SSUP_regions,Pos_Sticker_Regions,Neg_Sticker_Regions
P45973,0.9956347,Yes,"1-21,23-24,26-27,29,31-34,42-48,50-51,54-56,58,60-62,64-65,68-69,71-80,113-115,118-122,124-125,127,129-132,134,136,139,141,143-144,146-148,152,154-155,157-159,161-162,165-166,168-191","[(2, 3, 4, 5, 6, 7), (74, 75, 76, 77, 79), (29, 152)]","[(179, 180, 181), (12, 13, 14, 15, 16, 17), (18, 19, 20, 21, 23, 24, 42, 50, 56, 58)]"
Pre-calculated PSPire predicted scores of the following model organism proteomes using protein structures provided by AlphaFold were provided under the pre_calculated_scores folder. The model incorporating the Phos feature was specifically utilized for human proteins, while the model without the Phos feature was employed for proteins from other species. Additionally, the files also contain the predicted scores of proteins when the IDR-related features are ignored. As for human, the file also contains relative surface exposure and secondary structure state data of each residue.
- Arabidopsis thaliana
- Caenorhabditis elegans
- Candida albicans
- Danio rerio
- Dictyostelium discoideum
- Drosophila melanogaster
- Escherichia coli
- Glycine max
- Homo sapiens
- Methanocaldococcus jannaschii
- Mus musculus
- Oryza sativa
- Rattus norvegicus
- Saccharomyces cerevisiae
- Schizosaccharomyces pombe
- Zea mays
If you use the code or data in this repository, please cite:
@article{Hou2024,
author = {Hou, Shuang and Hu, Jiaojiao and Yu, Zhaowei and Li, Dan and Liu, Cong and Zhang, Yong},
title = {Machine learning predictor PSPire screens for phase-separating proteins lacking intrinsically disordered regions},
journal = {Nature Communications},
volume = {15},
number = {1},
pages = {2147},
year = {2024},
doi = {10.1038/s41467-024-46445-y},
url = {https://doi.org/10.1038/s41467-024-46445-y},
}