
Structure-based self-supervised learning enables ultrafast prediction of stability changes upon mutations

Primary LanguagePythonApache License 2.0Apache-2.0


Structure-based self-supervised learning enables ultrafast prediction of stability changes upon mutations


pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements.txt


To use the Pythia, you can run it from the command line with the following options:

Basic Usage

cd pythia
python masked_ddg_scan.py

By default, this will process files in the directory ../s669_AF_PDBs/ using cuda:0 (GPU 0) if available.

Command Line Options

  • --input_dir: Specifies the directory path containing the PDB files. Default is ../s669_AF_PDBs/.


    python masked_ddg_scan.py --input_dir "/path/to/directory/"
  • --pdb_filename: If you want to process a single PDB file instead of a directory, specify its path with this option.


    python masked_ddg_scan.py --pdb_filename "/path/to/file.pdb"
  • --check_plddt: Use this flag if you want to filter PDB files based on their pLDDT value. Files with a pLDDT value less than the specified cutoff (see below) will be ignored.


    python masked_ddg_scan.py --check_plddt
  • --plddt_cutoff: Specifies the pLDDT cutoff value if --check_plddt is used. Default is 95.


    python masked_ddg_scan.py --check_plddt --plddt_cutoff 90
  • --n_jobs: Indicates the number of parallel jobs to run. Default is 2.


    python masked_ddg_scan.py --n_jobs 4
  • --device: Specifies the device to use for computation. By default, it will use cuda:0 (GPU 0). If you want to use CPU or another GPU, specify it here. Valid values include cuda:0, cuda:1, ... for GPUs, or cpu for the CPU.


    python masked_ddg_scan.py --device cpu


  1. Process all PDB files in the directory /path/to/directory/, using the first GPU and checking pLDDT values with a cutoff of 90:

    python masked_ddg_scan.py --input_dir "/path/to/directory/" --check_plddt --plddt_cutoff 90 --device cuda:0
  2. Process a single PDB file /path/to/file.pdb using the CPU:

    python masked_ddg_scan.py --pdb_filename "/path/to/file.pdb" --device cpu

Megascale dataset, S2648, S669 contains predictions and labels.


  1. Download preprocessed files for training at CATH dataset or BioA dataset from the Google Drive:
    sbatch train_model.sh