/dfa_recommender

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

DFA recommender

CI Documentation Status codecov

System-specific density functional recommender

The idea is to recommend a density functional approximation (DFA) in the realm of density functional theory (DFT) that best approximate the properties that would be obtained by the reference method (coupled cluster, experiement, etc.). Here we assume we have the 3D geometry optimized at B3LYP to start with, since it has been observed that the optimized geometry obtained by DFT and mor accurate methods (e.g., CASPT2) is very similar. Therefore, we use a density fitting approach to decompose the electron density as node features on a molecular graph. Recommender approach

Due to the ambiguity of the definition of the best DFA (multiple DFAs perform similarly well in practice), we frame this question as a "regress-then-classify" task. We build transfer leanring models to directly predict the absolute difference of the result of the reference and a DFA. We do this for all candidate DFAs in the pool, where 48 DFAs that span multiple rungs of "Jacob's ladder" are considered in this workflow by default. Finally we sort the predicted differences and select the DFA that yields the lowest predicted difference. Recommender workflow

Installation

  1. Clone this repo git clone https://github.com/hjkgrp/dfa_recommender.git
  2. Setup a conda environment with the proived yaml file conda env create -f dfa_recommender/devtools/conda-envs/test_env.yaml
  3. Pip installation conda activate dfa_rec && cd dfa_recommender && pip install -e .
  4. Test everything works as expected python setup.py test

The installation should take less than 10 minutes on a normal laptop.

File structure

./dfa_recommender/
├── __init__.py
├── __pycache__
├── _version.py
├── data
├── dataset.py
├── df_class.py
├── df_utils.py
├── evaluate.py
├── ml_utils.py
├── net.py
├── predict.py
├── sampler.py
├── scripts
├── tests
├── tutorials-submitted
└── vat.py
  • All .py files are Python functions, where the comments and use case are available at the API section in the readthedoc document.
  • data contains csv file, featuration, trained models, and the optimized geometries for VSS-452 and CSD-76 set.
  • scripts conatins Python scripts for quick model training and electron density processing.
  • tests containts unit testing of the DFA recommender.
  • tutorials-submitted contains Jupyter notebooks that reproduce all the results in the paper. Please refer to at the Tutorial section in the readthedoc document for the details.

Dependency

  • Pytorch (1.10.0), Psi4 (1.6.1), Pandas (1.4.3), Scikit-learn (1.1.1)
  • Tested on MacOS (M1, M1pro), Linux (RedHat 7)

Citation

@Article {dfa_recommender,
author = {Duan, Chenru and Nandy, Aditya and Meyer, Ralf and Arunachalam, Naveen and Kulik, Heather J.},
title = {A Transferable Recommender Approach for Selecting the Best Density Functional Approximations in Chemical Discovery},
journal = {arXiv},
url = {https://arxiv.org/abs/2207.10747},
doi = {https://doi.org/10.48550/arXiv.2207.10747},
year = {2022},
}

Reproduction instructions

All the reults reported in the paper above should be reproduced by the Jupyter notebooks at dfa_recommender/tutorials-submitted. These notebooks also have code blocks demonstrating the usage of our models.

Developers

Chenru Duan at HJK Group@MIT

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.6.