Source code for our paper " GraphRXN: A Novel Representation for Reaction Prediction".
The code was built based on CMPNN (https://github.com/SY575/CMPNN ), DeepReac (https://github.com/bm2-lab/DeepReac ), YieldBert (https://github.com/bm2-lab/DeepReac ).
Thanks a lot for their sharing.
Figure 1. Model architecture of GraphRXN
Figure 2. General workflow of HTE process
Figure 3. Reaction scheme and substrate scope
Figure 4. Distribution of Ratio(UV), where A represents amine, and B represents bromide
Figure 5. The scatter plots of GraphRXN on the entire dataset
Model Performance of three public datasets over ten-fold CV on test set
Dataset
Methods
R2
MAE
RMSE
Dataset1
GraphRXN-concat
0.951
4.30
5.98
Dataset1
GraphRXN-sum
0.937
4.85
6.80
Dataset1
Yield-BERT
0.951
4.00
6.03
Dataset1
DeepReac+
0.922
5.25
7.54
Dataset2
GraphRXN-concat
0.844
7.94
11.08
Dataset2
GraphRXN-sum
0.838
8.09
11.29
Dataset2
Yield-BERT
0.815
8.13
12.08
Dataset2
DeepReac+
0.827
8.06
11.65
Dataset3
GraphRXN-concat
0.892
0.16
0.23
Dataset3
GraphRXN-sum
0.881
0.18
0.24
Dataset3
Yield-BERT
0.886
0.16
0.24
Dataset3
DeepReac+
0.853
0.18
0.25
Model performance of in-house dataset over 5-fold CV on test set
Groupe
Size
methods
R2
MAE
RMSE
Entire
1558
GraphRXN-concat
0.713
0.06
0.09
Entire
1558
GraphRXN-sum
0.704
0.06
0.09
Entire
1558
Yield-BERT
0.645
0.10
0.07
Entire
1558
DeepReac+
0.610
0.07
0.10
G1
317
GraphRXN-concat
0.661
0.08
0.11
G1
317
GraphRXN-sum
0.462
0.11
0.14
G1
317
Yield-BERT
0.718
0.07
0.10
G1
317
DeepReac+
0.551
0.09
0.13
G2
419
GraphRXN-concat
0.629
0.05
0.07
G2
419
GraphRXN-sum
0.592
0.06
0.07
G2
419
Yield-BERT
0.512
0.06
0.08
G2
419
DeepReac+
0.528
0.06
0.08
G3
401
GraphRXN-concat
0.802
0.06
0.08
G3
401
GraphRXN-sum
0.775
0.06
0.08
G3
401
Yield-BERT
0.785
0.06
0.08
G3
401
DeepReac+
0.745
0.07
0.09
G4
421
GraphRXN-concat
0.459
0.08
0.12
G4
421
GraphRXN-sum
0.419
0.09
0.12
G4
421
Yield-BERT
0.503
0.08
0.11
G4
421
DeepReac+
0.23
0.10
0.14
conda env create -f GraphRXN.yaml ### Create GraphRXN env
conda activate GraphRXN
python reaction_train.py --data_path data_scaler/Buchward-Hartwig/random_split/FullCV_01_train_temp_scaler.csv
--separate_test_path data_scaler/Buchward-Hartwig/random_split/FullCV_01_test_temp_scaler.csv
--dataset_type regression
--num_folds 1
--gpu 0
--epochs 100
--batch_size 128
--save_dir ./result/Buchward/concat_01_temp
--metric r2
--reaction_agg_method concat
Note: If choosing summation aggregation method, please specify --reaction_agg_method sum
cd DeepReac
conda env create -f DeepReact.yaml ### Create DeepReac+ env
conda activate DeepReact
### train and predict
python DeepReac_train.py -train data_scaler/Buchward-Hartwig/random_split/FullCV_01_train_temp_scaler.csv
-test data_scaler/Buchward-Hartwig/random_split/FullCV_01_test_temp_scaler.csv
-epochs 100
-stats ./result_scaler/Buchward_01_test_stats.csv
cd Yield-BERT
conda env create -f rxnyields.yaml ### create Yield-BERT env
conda activate rxnyields
cd yield-BERT_baseline
### For Dataset 1 (Buchwald) training
python launch_buchwald_hartwig_training.py
### For Dataset 2 (Suzuki) training
python lauch_suzuki_miyaura_training.py
### For Dataset 3 (Denmark) training
python data3_training_10cv.py
### For in-house dataset training
python inhouse_data_transform.py