/uncertainty_benchmarking

Various code/notebooks to benchmark different ways we could estimate uncertainty in ML predictions.

Primary LanguageJupyter Notebook

This reposity houses various experiments where we perform ML regressions using various methods. Our goal is to down-select the ML methods that give us the best prediction accuracy, uncertainty calibration, uncertainty sharpness, and score by a proper scoring rule.

Content

Each folder in this repository contains Jupyter notebooks that document how we used various ML methods to perform regressions on our data, and the results thereof. Each of these notebooks use the same set of features and data splits that we created from a couple of databases.

Data wrangling

The preprocessing folder shows how we used various APIs to gather and preprocess our data into features.

Data sources

Our primary data source is the database we created using GASpy: GASdb. Our secondary data source is Catalysis-Hub, although we did not end up continuing to use this data source for various reasons.

Preprocessing

One way to convert our database of atomic structures into features is to fingerprint them using a method outlined in this paper. Thus fingerprints in this repository reference this type of feature.

Another way to featurize atomic structures is to use a StructureDataTransformer for a Crystal Graph Convolutional Neural Network. Thus sdts in this repository reference this type of feature.

Written report

The written report of our findings can be found in the paper folder, along with all of the files needed to compile the report. Additionally, the report can be found on arXiv:

Methods for comparing uncertainty quantifications for material property predictions
Kevin Tran, Willie Neiswanger, Junwoong Yoon, Eric Xing, Zachary W. Ulissi.
arXiv:1912.10066.

Dependencies

Refer to our list of dependencies for the dependencies required to run the Jupyter notebooks in this repository.