OPTIMADE NOMAD CoE Tutorial Exercises

Matthew Evans, UCLouvain

Introduction

These open-ended exercises are provided to accompany NOMAD CoE Tutorial 6: OPTIMADE, run across the 7th and 8th of September 2021.

This document is hosted on GitHub, and all feedback related to the exercises can be provided as an issue in that repository.

If you would like to get involved with the OPTIMADE consortium, you can find some more details on the OPTIMADE home page.

The OPTIMADE specification defines a JSON API that can be accessed with many different tools. You will have heard about three such tools in the tutorial:

  1. The Materials Cloud web-based OPTIMADE client.
  2. The optimade.science web-based aggregator.
  3. pymatgen's built-in OPTIMADE client.

Each of these clients can send requests to multiple OPTIMADE providers simultaneously, based on the provider list at https://providers.optimade.org/.

You may also wish to familiarise yourselves with the OPTIMADE API by writing your own queries, scripts or code. Some possible options:

  • Craft (or copy) your own URL queries to a particular OPTIMADE implementation. Some web browsers (e.g. Firefox) will automatically format the JSON response for you (see Exercise 1).
  • Use command-line tools such as curl or wget to receive data in your terminal, or pipe it to a file. You could use the tool jq to format the JSON response.
  • Make an appropriate HTTP request from your programming language of choice. For Python, you could use the standard library urllib.request or the more ergonomic external library requests. In Javascript, you can just use fetch(...).

Exercise 1

This aim of this exercise is to familiarise yourself with the OPTIMADE JSON API. In the recent OPTIMADE paper [1], we provided the number of results to a set of queries across all OPTIMADE implementations, obtained by applying the same filter to the structures endpoint of each database. The filters are:

  • Query for structures containing a group IV element: elements HAS ANY "C", "Si", "Ge", "Sn", "Pb".

  • As above, but return only binary phases: elements HAS ANY "C", "Si", "Ge", "Sn", "Pb" AND nelements=2.

  • This time, exclude lead and return ternary phases: elements HAS ANY "C", "Si", "Ge", "Sn" AND NOT elements HAS "Pb" AND elements LENGTH 3.

  • In your browser, try visiting the links in Table 1 of the OPTIMADE paper [1] (clickable links in arXiv version [2]), which is reproduced below.

    • Familiarise yourself with the standard JSON:API output fields (data, meta and links).
    • You will find the crystal structures returned for the query as a list under the data key, with the OPTIMADE-defined fields listed under the attributes of each list entry.
    • The meta field provides useful information about your query, e.g. data_returned shows how many results there are in total, not just in the current page of the response (you can check if the table still contains the correct number of entries, or if it is now out of date).
    • The links field provides links to the next or previous pages of your response, in case you requested more structures than the page_limit for that implementation.
  • Choose one particular entry to focus on: replace the filter URL parameter with /<structure_id> for the id of one particular structure (e.g. https://example.org/optimade/v1/structures/<structure_id>).

  • Explore other endpoints provided by each of these providers. If they serve "extra" fields (i.e. those containing the provider prefix), try to find out what these fields mean by querying the /info/structures endpoint.

  • Try performing the same queries with some of the tools listed above, or in scripts of your own design.

Provider N1 N2 N3
AFLOW 700,192 62,293 382,554
Crystallography Open Database (COD) 416,314 3,896 32,420
Theoretical Crystallography Open Database (TCOD) 2,631 296 660
Materials Cloud 886,518 801,382 103,075
Materials Project 27,309 3,545 10,501
Novel Materials Discovery Laboratory (NOMAD) 3,359,594 532,123 1,611,302
Open Database of Xtals (odbx) 55 54 0
Open Materials Database (omdb) 58,718 690 7,428
Open Quantum Materials Database (OQMD) 153,113 11,011 70,252

[1] Andersen et al., "OPTIMADE, an API for exchanging materials data", Sci Data 8, 217 (2021) 10.1038/s41597-021-00974-z.

[2] Andersen et al., "OPTIMADE, an API for exchanging materials data" (2021) arXiv:2103.02068.

Exercise 2

The filters from Exercise 1 screened for group IV containing compounds, further refining the query to exclude lead, and finally to include only ternary phases.

  • Choose a suitable database and modfiy the filters from Exercise 1 to search for binary [III]-[V] semiconductors.
    • A "suitable" database here is one that you think will have good coverage across this chemical space.
  • Using the chemical_formula_anonymous field, investigate the most common stoichiometric ratios between the constituent elements, e.g. 1:1, 2:1, etc.
    • You may need to follow pagination links (links->next in the response) to access all available data for your query, or you can try adding the page_limit=100 URL parameter to request more structures per response.
  • Apply the same filter to another database and assess the similarity between the results, thinking carefully about how the different focuses of each database and different methods in their construction/curation could lead to biases in this outcome.
    • For example, an experimental database may have one crystal structure entry per experimental sample studied, in which case the most useful (or "fashionable") compositions will return many more entries, especially when compared to a database that curates crystal structures such that each ideal crystal has one canonical entry (e.g., a database of minerals).
  • Try to use the query you have constructed in the multi-provider clients (linked above), to query all OPTIMADE providers simultaneously.

Exercise 3

There are many useful properties that the OPTIMADE specification has not standardized. This is typically because the use of the property requires additional context, e.g., reporting a "band gap" without describing how it was calculated or measured, or properties that are only meaningful in the context of a database, e.g., relative energies that depend on other reference calculations. For this reason, the OPTIMADE specification allows implementations to serve their own fields with an appropriate "provider prefix" to the field name, and a description at the /info/structures endpoint.

One computed property that is key to many high-throughput studies is the chemical stability of a crystal structure, i.e. whether the structure is predicted to spontaneously decompose into a different phase (or phases). This is typically computed as the distance from the convex hull in composition-energy space, with a value of 0 (or <0, if the target structure was not used to compute the hull itself) indicating a stable structure.

  • Interrogate the /info/structures endpoints of the OPTIMADE implementations that serve DFT data (e.g., Materials Project, AFLOW, OQMD, etc.) and identify those that serve a field that could correspond to hull distance, or other stability metrics.
  • Construct a filter that allows you to screen a database for metastable materials (i.e., 0 < δ < 25 meV/atom) according to this metric.
  • Try to create a filter that can be applied to multiple databases simultaneously (e.g., apply ?filter=_databaseA_hull_distance < 25 OR _databaseB_stability < 25). What happens when you run this filter against a database that does not contain the field?

Exercise 4

As a final exercise, consider your own research problems and how you might use OPTIMADE. If you have any suggestions or feedback about how OPTIMADE can be made more useful for you, please start a discussion on the OPTIMADE MatSci forum or raise an issue at the appropriate GitHub repository (Materials-Consortia GitHub.

Some potential prompts:

  • What additional fields or entry types should OPTIMADE standardize to be most useful to you?
  • How could the existing tools be improved, or what new tools could be created to make OPTIMADE easier to use?
  • What features from other APIs/databases that you use could be adopted within OPTIMADE?