Tools for making OPTIMADE APIs from various formats of structural data (e.g. an archive of CIF files).
This repository contains the src/optimade-maker
Python package and the corresponding CLI tool optimake
that work towards this aim. Features include
- definition of a config file format (
optimade.yaml
) for annotating data archives to be used in the OPTIMADE ecosystem; - conversion of the raw data into corresponding OPTIMADE types using pre-existing parsers (e.g., ASE for structures);
- conversion of the annotated data archive into an intermediate JSONLines file format that can be ingested into a database and used to serve a full OPTIMADE API.
- serving either an annotated data archive or a JSONLines file as an OPTIMADE API (using the
optimade-python-tools
reference server implementation).
See ./examples
for a more complete set of supported formats and corresponding optimade.yaml
config files.
To annotate your structural data for optimade-maker
, the data archive needs to be accompanied by an optimade.yaml
config file. The following is a simple example for a zip archive (structures.zip
) of cif files together with an optional property file (data.csv
):
config_version: 0.1.0
database_description: Simple database
entries:
- entry_type: structures
entry_paths:
- file: structures.zip
matches:
- cifs/*/*.cif
# (optional) property file and definitions:
property_paths:
- file: data.csv
property_definitions:
- name: energy
title: Total energy per atom
description: The total energy per atom as computed by DFT
unit: eV/atom
type: float
optimade-maker
will assign an id
for each structure based on its full path in the archive, following a simple deterministic rule: from the set of all archive paths, the maximum common path prefix and postfix (including file extensions) are removed. E.g.
structures.zip/cifs/set1/101.cif
structures.zip/cifs/set2/102.cif
produces ["set1/101", "set2/102"]
.
The property files need to either refer to these id
s or the full path in the archive to be associated with a structure. E.g. a possible property csv
file could be
id,energy
set1/101,2.5
structures.zip/cifs/set2/102.cif,3.2
Install with
pip install optimade-maker
this will also make the optimake
CLI utility available.
For a folder containing the data archive and the optimade.yaml
file (such as in /examples
), run
optimake convert .
to just convert the entry into the JSONL format (see below).optimake serve .
to start the OPTIMADE API (this also first converts the entry, if needed);
For more detailed information see also optimake --help
.
As described above, optimade-maker
works via an intermediate JSONLines file representation of an OPTIMADE API (see also the corresponding issue in the specification).
This file should provide enough metadata to spin up an OPTIMADE API with many different entry types.
The format is as follows:
- First line must be a dictionary with the key
x-optimade
, containing a sub-dictionary of metadata (such as the OPTIMADE API version). - Second line contains the
info/structures
endpoint. - Third line contains the
info/references
endpoint, if present. - Then each line contains an entry from the corresponding individual structure/reference endpoints.
{"x-optimade": {"meta": {"api_version": "1.1.0"}}}
{"type": "info", "id": "structures", "properties": {...}}
{"type": "info", "id": "references", "properties": {...}}
{"type": "structures", "id": "1234", "attributes": {...}}
{"type": "structures", "id": "1235", "attributes": {...}}
{"type": "references", "id": "sfdas", "attributes": {...}}
NOTE: the info/
endpoints in OPTIMADE v1.2.0 will include type
and id
as well.
Initial prototype was created at the Paul Scherrer Institute, Switzerland in the week of 12th-16th June 2023.
Authors (alphabetical):
- Kristjan Eimre
- Matthew Evans
- Giovanni Pizzi
- Gian-Marco Rignanese
- Jusong Yu
- Xing Wang