Source code for our IJCAI 2020 paper Communicative Representation Learning on Attributed Molecular Graphs
The code was built based on DMPNN. Thanks a lot for their code sharing!
Dataset | BBBP | Tox21 | Sider | ClinTox | ESOL | FreeSolv |
---|---|---|---|---|---|---|
Task | Classification | Classification | Classification | Classification | Regression | Regression |
RF | 0.788 | 0.619 | 0.572 | 0.544 | 1.176 | 2.048 |
FNN | 0.899 | 0.788 | 0.652 | 0.688 | 2.152 | 3.043 |
GCN | 0.690 | 0.829 | 0.638 | 0.807 | 0.970 | 1.400 |
Weave | 0.671 | 0.820 | 0.581 | 0.832 | 0.610 | 1.220 |
RGAT | 0.875 | 0.821 | 0.621 | 0.841 | 0.731 | 1.338 |
N-Gram | 0.890 | 0.842 | - | 0.870 | 0.718 | 1.371 |
MPNN | 0.910±0.032 | 0.844±0.014 | 0.641±0.014 | 0.881±0.037 | 0.702±0.042 | 1.242±0.249 |
DMPNN w/o FP | 0.913±0.026 | 0.845±0.015 | 0.646±0.016 | 0.894±0.027 | 0.665±0.052 | 1.157±0.105 |
CMPNN w/o FP (our) | 0.963±0.003 | 0.856±0.007 | 0.666±0.007 | 0.933±0.012 | 0.819±0.147 |
Prediction results of CMPNN, its variants and baselines on six chemical graph datasets. We used a 5-fold cross validation with random split and replicated experiments on each tasks for five times, and reported the mean and standard deviation of AUC or RMSE values. For methodology and scaffold-split results, please refer to paper for more details.
* The authors note that there was a mistake in previous ESOL dataset. The result has been corrected. Thanks to Shengchao and Sorkun for their friendly reminders.
- cuda >= 8.0
- cuDNN
- RDKit
- torch >= 1.2.0
Tips: Using code conda install -c rdkit rdkit
can help you install package RDKit quickly.
To run the demo code on dataset BBBP, run:
python train_demo.py
To train a model, run:
python train.py --data_path <path> --dataset_type <type> --num_folds 5 --gpu 0 --epochs 30
where <path>
is the path to a CSV file containing a dataset, <type>
is either "classification" or "regression" depending on the type of the dataset.
python predict.py --data_path <path> --checkpoint_dir <dir>
where <dir>
is the directory where the model checkpoint(s) are saved, and <path>
is the path of SMILES dataset
Please cite the following paper if you use this code in your work.
@inproceedings{ijcai2020-392,
title = {Communicative Representation Learning on Attributed Molecular Graphs},
author = {Song, Ying and Zheng, Shuangjia and Niu, Zhangming and Fu, Zhang-hua and Lu, Yutong and Yang, Yuedong},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Christian Bessiere}
pages = {2831--2838},
year = {2020},
month = {7},
note = {Main track}
doi = {10.24963/ijcai.2020/392},
url = {https://doi.org/10.24963/ijcai.2020/392},
}