MetaRF: Differentiable Random Forest for Reaction Yield Prediction with a Few Trails

This is the implementation for our paper "MetaRF: Differentiable Random Forest for Reaction Yield Prediction with a Few Trails".

Workflow

Setup

Check dependencies

 - tensorflow==2.9.2
 - kennard-stone==1.1.2
 - numpy
 - pandas
 - sklearn

Clone this repo

git clone https://github.com/Nikki0526/MetaRF.git

Data preprocessing

Run $ data_preprocessing.py to preprocess the data.
This step includes random forest module and dimension-reduction module.
The original reaction and yield data in this paper is from [1], [2] and [3].
We also provide the data after preprocessing in /data(Two datasets are too large for github and they can be downloaded from Google Drive).

Model training

Run $ train.py to perform meta-training and model saving.
Our trained model can be downloaded from Google Drive.

Model fine-tuning and testing

Run $ test.py to perform few-shot fine-tuning, dimension-reduction based sampling method and model evaluation.
We use relative path in this repository. Please place the downloaded model in the /model folder.
[update] We add more baseline comparision in $ baseline.py, including RXNFP [4], DRFP [5], etc.

Tutorial

We provide a step-by-step tutorial that includes the whole workflow (including Data preprocessing, Model training, Model fine-tuning and testing, Baseline comparision) in $ Workflow of MetaRF - Tutorial.ipynb. We also provide a colab version, which can help users easily access our code and environment by clicking:

Note: 
In this tutorial, we take the procedure for Buchwald Hartwig HTE dataset as an example. The other two datasets share the same procedure.

Questions

For further question about the code, please contact 'kexinchen0526@gmail.com'.

References

[1] Ahneman, D.T., Estrada, J.G., Lin, S., Dreher, S.D., Doyle, A.G.: Predicting reaction performance in c–n cross-coupling using machine learning. Science 360(6385), 186–190(2018).

[2] Perera, D., Tucker, J.W., Brahmbhatt, S., Helal, C.J., Chong, A., Farrell, W., Richardson, P., Sach, N.W.: A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359(6374), 429–434(2018).

[3] Saebi, M., Nan, B., Herr, J., Wahlers, J., Guo, Z., Zura ́nski, A., Kogej, T., Norrby, P.-O., Doyle, A., Wiest, O., et al.: On the use of real-world datasets for reaction yield prediction. ChemRxiv (2021).

[4] Schwaller, P., Vaucher, A. C., Laino, T., & Reymond, J. L. (2021). Prediction of chemical reaction yields using deep learning. Machine learning: science and technology, 2(1), 015016.

[5] Probst, D., Schwaller, P., & Reymond, J. L. (2022). Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digital discovery, 1(2), 91-97.