Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions from 3D Structures
This repository contains the source code for structure-based virtual screening (SBVS). For protein-ligand affinity (PLA) predictions, please refer to our dedicated repository at EHIGN_PLA on GitHub.
The LIP-PCBA dataset is publicly available at the following locations:
- Original LIT-PCBA [1]: LIT-PCBA
- Docked Data (with 3D structures for small compounds) [2]: 3D Structures
Preprocessed data (molecular graphs) can be downloaded from:
The following Python packages are required:
dgl==0.9.0
networkx==2.5
numpy==1.19.2
pandas==1.1.5
pymol==0.1.0
rdkit==2022.3.5
scikit_learn==1.1.2
scipy==1.5.2
torch==1.10.2
tqdm==4.63.0
openbabel==3.3.1 (conda install -c conda-forge openbabel)
Alternatively, install the environment using the provided YAML file at ./environment.yaml
.
./config
: Parameters used in EHIGN../log
: Logger../model
: Contains several trained models for reproducing results.
CIGConv.py
,NIGConv.py
,EHIGN.py
: Implementations of CIGConv, NIGConv, and EHIGN.HGC.py
: Heterogeneous graph neural network implementation (modified from dgl source code).preprocess_complex.py
: Prepare input complexes.graph_constructor.py
: Convert protein-ligand complexes into heterogeneous graphs.train.py
: Train the EHIGN model.test.py
: Use models in ./model directory for prediction.
Download processed data from Graphs 1 and Graphs 2.
Organize the data as follows:
-docking_poses
-ALDH1_4x4l
-train
-ALDH1_4x4l_decoys_22407376-EHIGN.dgl
...
-val
...
-FEN1_5fv7
...
-GBA_2v3e
...
...
The ./model
directory contains seven trained models for reproducing results.
run python train.py --data_root your_own_data_path/docking_poses
run python test.py --data_root your_own_data_path/docking_poses
By default, the seven trained models in the ./model
directory are used.
First, run python preprocess_complex.py --data_root your_own_data_path/docking_poses
Then, run python graph_constructor.py --data_root your_own_data_path/docking_poses
to generate graphs
[1] Tran-Nguyen V K, Jacquemard C, Rognan D. LIT-PCBA: an unbiased data set for machine learning and virtual screening[J]. Journal of chemical information and modeling, 2020, 60(9): 4263-4273.
[2] Shen C, Weng G, Zhang X, et al. Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?[J]. Briefings in Bioinformatics, 2021, 22(5): bbaa410.