This reposity houses various experiments where we perform ML regressions using various methods. Our goal is to down-select the ML methods that give us the best prediction accuracy, uncertainty calibration, uncertainty sharpness, and score by a proper scoring rule.
Each folder in this repository contains Jupyter notebooks that document how we used various ML methods to perform regressions on our data, and the results thereof. Each of these notebooks use the same set of features and data splits that we created from a couple of databases.
The preprocessing
folder shows how we used various APIs to gather and preprocess our data into features.
Our primary data source is the database we created using GASpy: GASdb. Our secondary data source is Catalysis-Hub, although we did not end up continuing to use this data source for various reasons.
One way to convert our database of atomic structures into features is to fingerprint them using a method outlined in this paper.
Thus fingerprint
s in this repository reference this type of feature.
Another way to featurize atomic structures is to use a StructureDataTransformer
for a Crystal Graph Convolutional Neural Network.
Thus sdt
s in this repository reference this type of feature.
The written report of our findings can be found in the paper folder, along with all of the files needed to compile the report. Additionally, the report can be found on arXiv:
Methods for comparing uncertainty quantifications for material property predictions
Kevin Tran, Willie Neiswanger, Junwoong Yoon, Eric Xing, Zachary W. Ulissi.
arXiv:1912.10066.
Refer to our list of dependencies for the dependencies required to run the Jupyter notebooks in this repository.