/openQDC

Repository of Quantum Datasets Publicly Available

Primary LanguagePythonOtherNOASSERTION

openQDC - Open Quantum Data Commons

Docs | Homepage


license

Installing openQDC

git clone git@github.com:OpenDrugDiscovery/openQDC.git
cd openQDC
# use mamba/conda
mamba env create -n openqdc -f env.yml
pip install -e .

Tests

You can run tests locally with:

pytest

Documentation

You can build the documentation locally with:

mkdocs serve

Downloading Datasets

A command line interface is available to download datasets or see which dataset is available, for more information please run openqdc --help.

# Display the available datasets
openqdc datasets

# Display the help message for the download command
openqdc download --help

# Download the Spice and QMugs dataset
openqdc download Spice QMugs

Overview of Datasets

We provide support for the following publicly available QM Potential Energy Datasets.

Potential Energy

Dataset # Molecules # Conformers Average Conformers per Molecule Force Labels Atom Types QM Level of Theory Off-Equilibrium Conformations
ANI 57,462 20,000,000 348 No 4 ωB97x:6-31G(d) Yes
GEOM 450,000 37,000,000 82 No 18 GFN2-xTB No
Molecule3D 3,899,647 3,899,647 1 No 5 B3LYP/6-31G* No
NablaDFT 1,000,000 5,000,000 5 No 6 ωB97X-D/def2-SVP
OrbNet Denali 212,905 2,300,000 11 No 16 GFN1-xTB Yes
PCQM_PM6 1 No PM6 No
PCQM_B3LYP 85,938,443 85,938,443 1 No B3LYP/6-31G* No
QMugs 665,000 2,000,000 3 No 10 GFN2-xTB, ωB97X-D/def2-SVP No
QM7X 6,950 4,195,237 603 Yes 7 PBE0+MBD Yes
SN2RXN 39 452709 11,600 Yes 6 DSD-BLYP-D3(BJ)/def2-TZVP
SolvatedPeptides 2,731,180 Yes revPBE-D3(BJ)/def2-TZVP
Spice 19,238 1,132,808 59 Yes 15 ωB97M-D3(BJ)/def2-TZVPPD Yes
tmQM 86,665 86,665 1 No TPSSh-D3BJ/def2-SVP
Transition1X 9,654,813 Yes ωB97x/6–31 G(d) Yes
WaterClusters 1 4,464,740 No 2 TTM2.1-F Yes

Interaction energy

We also provide support for the following publicly available QM Noncovalent Interaction Energy Datasets.

Dataset
DES370K
DES5M
Metcalf
DESS66
DESS66x8
Splinter
X40
L7

CI Status

The CI runs tests and performs code quality checks for the following combinations:

  • The three major platforms: Windows, OSX and Linux.
  • The four latest Python versions.
main
Lib build & Testing test
Code Sanity (linting and type analysis) code-check
Documentation Build doc
Pre-Commit pre-commit

How to cite

All data presented in the OpenQDC are already published in scientific journals, full reference to the respective paper is attached to each dataset class. When citing data obtained from OpenQDC, you should cite both the original paper(s) the data come from and our paper on OpenQDC itself. The reference is:

ADD REF HERE LATER