/tools-and-data

A collection of tools and databases for atomistic machine learning

Creative Commons Zero v1.0 UniversalCC0-1.0

Contributors: 2021 N. Artrith (nartrith@atomistic.net), T. Morawietz, H. Guo, and A. Urban

List of Public Tools and Data for Atomistic Machine Learning

A Collection of Public Open-Source Tools and Databases for Atomistic Machine Learning

Table of Contents

Contributing

We welcome everybody to contribute to this list. Your name will be added to the list of contributors at the top of this document.

ML atomistic potentials

ANN based potential implementations

Entries sorted by the year of the publication.

Name Features Reference
ænet Capable of handling many chemical species Artrith, Urban, Comput. Mater. Sci. 114 (2016) 135
Amp Large descriptor library Khorshidi, Peterson, Comput. Phys. Commun. 207 (2016) 310
ANI Accurate potential for molecular systems Smith, Isayev, Roitberg, Chem. Sci. 8 (2017) 3192
TensorMol Electrostatics and van der Waals interactions Yao et al., Chem. Sci. 9 (2018) 2261
DeePMD-kit GPU support Wang et al., Comput. Phys. Commun. 228 (2018) 178
SchNetPack Feature learning Schütt et al., J. Chem. Theory Comput. 15 (2019) 448
N2P2 Behler-Parinello neural network potential Singraber et al., J. Chem. Theory Comput. 15 (2019) 1827
SchNarc Extension to multiple electronic states based on SchNet and SHARC Westermayr et al., J. Phys. Chem. Lett. 11 (2020) 3828
PANNA Properties from neural network architectures Lot et al., Comput. Phys. Commun. 256, (2020) 107402
TorchANI Pytorch implementation of ANI Gao et al., J. Chem. Inf. Model., 10.1021/acs.jcim.0c00451 (2020)

Other ML based potential implementations

Name Description Reference
GAP/SOAP GPR based ML potential Bartók et al., Phys. Rev. Lett. 104 (2010) 136403 Phys. Rev. B 87 (2013) 184115
SNAP Linear ML potential based on bispectrum components of the local neighbor density Thompson et al., J. Comput. Phys. 285 (2015) 316
AutoForce SGPR based ML potential (on-the-fly) Hajibabaei et al., Phys. Rev. B. 103 (2021) 214102

ML tools and packages for materials science and drug discovery applications

Name Description Reference
COMBO Bayesian Optimization Library Ueno et al., Materials Discovery 4 (2016) 18
Magpie ML framework Ward, Wolverton et al., npj Computational Materials. 2 (2016) 16028
PROPhet Neural networks to materials predictions Kolb et al., Sci Rep 7 (2017) 1192
SISSO ML framework Ouyang, Ghiringhelli et al., Phys. Rev. Mater. 2, (2018) 083802
MatMiner Feature construction library Ward, Jian et al., Comput. Mater. Sci. 152 (2018) 60
AFLOW-ML ML framework Gossett, Curtarolo et al., Comput. Mater. Sci. 152 (2018) 134
Phoenics Bayesian Optimization and kernel density estimation Häse et al., ACS Cent. Sci. 4 (2018) 1134
JARVIS-ML Properties predictions Choudhary et al., Phys. Rev. Materials 2 (2018) 083801
OMDB-ML Properties predictions Olsthoorn et al., Adv. Quantum Technol. 2 (2019) 1900023
DeepChem Democratizing Deep-Learning for Drug Discovery Ramsundar et al., O'Reilly Media (2019)
ShiftML ML framework for predicting chemical shifts in molecular solids Paruzzo et al., Nat. Commun. 9 (2019) 4501
MaterialNet A web-based graph explorer for materials science data Choudhury et al., JOSS 5, (2020) 2105

Databases

General databases

Name Description Reference
NOMAD Repository Open-Access Platform for Data Sharing Draxl, Scheffler, J. Phys. Mater. 2 (2019) 036001
Materials Cloud Platform for Open Computational Science Talirz et al., arXiv:2003.12510 (2020)

Databases for inorganic materials

Name Description Reference
American Mineralogist Crystal Structure Database Crystal structure database for mineralogist Downs and Hall-Wallace, American Mineralogist 88 (2003) 247
COD Crystallography Open Database Grazulis et al. (2009), Gražulis (2012), Gražulis (2015), Merkys (2016), Quirós (2018), Vaitkus (2021)
AFLOW Ab initio computational materials science database Curtarolo et al., Cumput. Mater. Sci. 58 (2012) 218
NREL MatDB Computational materials database with focus on renewable energy applications Stevanovic et al. (2012), Lany (2013), Lany (2015)
Materials Project A materials genome approach to accelerating materials innovation Jain et al., APL Materials 1 (2013) 011002
OQMD Database of DFT calculated thermodynamic and structural materials properties Kirklin et al., Npj Comput. Mater. 1 (2015) 15010
COMBO Bayesian Optimization Library Ueno et al., Materials Discovery 4 (2016) 18
Open Catalyst Project Using AI to model and discover new catalysts to address the energy challenges posed by climate change     Facebook AI and Carnegie Mellon (2020)
JARVIS-API Integrated Infrastructure for Data-driven Materials Design Choudhary et al., arXiv:2007.01831 (2020)

Databases for organic molecules and materials

Name Description Reference
MoleculeNet Large scale benchmark for molecular machine learning Wu et al., Chem. Sci. 9 (2018) 513
FMODB Database of quantum mechanical FMO calculations Kato et al., J. Chem. Inf. Model. 10.1021/acs.jcim.0c00273 (2020)
QM-sym Symmetrized quantum chemistry database of 135k organic molecules Liang et al., Sci. Data 6 (2020) 213

Workflow management

Name Description Reference
Research Object Crate A JSON-based approach for research object serialization Bechhofer et al., Future Generation Computer Systems 29 (2013) 599-611
Common Workflow Language An open standard for analysis workflows and tools Amstutz et al., Common Workflow Language, v1.0 (2016)
DLHub Sharing of ML models and workflows Chard et al., IEEE IPDPS (2019) 283-292, Blaiszik et al., MRS Commun. 9 (2019) 1125–1133

Peer-reviewed articles referring to this document

  1. H. Guo, Q. Wang, A Stuke, A. Urban, and N. Artrith, Front. Energy Res. just accepted, (2021) Open Access.
  2. A. M. Miksch, T. Morawietz, J. Kästner, A. Urban, and N. Artrith, Machine Learning: Science and Technology, in press, (2021) Open Access DOI: https://doi.org/10.1088/2632-2153/abfd96 .
  3. T. Morawietz and N. Artrith, J. Comput. Aided Mol. Des. 35, 557-586 (2021) Open Access DOI: https://doi.org/10.1007/s10822-020-00346-6 .