/EHIGN_PLA

Primary LanguagePythonMIT LicenseMIT

Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions from 3D Structures

Note

  • Implementation of other baselines can be found on GIGN.
  • This repository contains the source code for PLA prediction. For structure-based virtual screening (SBVS), please refer to our dedicated repository at EHIGN_SBVS on GitHub.

Dataset

All data used in this paper are publicly available at the following locations:

  • PDBbind v2016 and v2019: pdbbind
  • 2013 and 2016 core sets: casf

The preprocessed data can be downloaded from Graphs.

Requirements

dgl==0.9.0
networkx==2.5
numpy==1.19.2
pandas==1.1.5
pymol==0.1.0
rdkit==2022.3.5
scikit_learn==1.1.2
scipy==1.5.2
torch==1.10.2
tqdm==4.63.0
openbabel==3.3.1 (conda install -c conda-forge openbabel)

Alternatively, install the environment using the provided YAML file at ./environment.yaml.

Descriptions of Folders and Files

  • ./data: Contains information about various datasets. Download and organize preprocessed datasets as described.
  • ./config: Parameters used in EHIGN.
  • ./log: Logger.
  • ./model: Contains model checkpoints and training records.
  • Scripts and Implementations: Various Python files implementing models, preprocessing, training, and testing.

Step-by-step Running

1. Model Training

  • Download the preprocessed datasets and organize them in the ./data folder.
  • Run python train.py.

2. Model Testing

  • Run python test.py (modify file paths in the source code if necessary).

3. Process Raw Data

  • Run a demo using provided examples:
    • python preprocess_complex.py
    • python graph_constructor.py
    • python train_example.py

4. Test the Trained Model in Other External Test Sets

  • Organize the data like: -data
      -external_test
          -pdb_id
            -pdb_id_ligand.mol2
            -pdb_id_protein.pdb

  • Execute the following commands:

    • python preprocess_complex.py
    • python graph_constructor.py
    • python test.py
    • (Modify file paths in the source code if necessary)

5. Cold Start Settings

  • Use datasets found in the ./cold_start_data folder.
  • Execute scripts train_random.py, train_scaffold.py, and train_sequence.py if the original training set has been processed.