This repo contains alpha-stage LinkML-oriented tools, primarily in support of BBOP-affiliated projects like the National Microbiome Data Collaborative. As they mature, they will be migrated to other GitHUb organizastions, like https://github.com/microbiomedata
The tool are written in Python and managed with Poetry.
Installation notes are avaialble for:
User interfaces are buit with Click
In a local checkout of this repo, tools can be run with poetry run <tool>
.
There are also Makefile rules that illustrate the usage of command line parameters. These rules assume that the following GH repos have been cloned into directories located in the same parent directory as this repo.
Converting one or more machine-readalbe documents into a DataHarmonizer interface
We use DataHarmonizer to build interfaces for gathering standards-based data, such as biosample metadata. For example, one NMDC biosample metadata iterface might combine the soil
package/class from MiXS with the biosample
class from the nmdc-schema
Additional links for MIxS:
In other words, the LinkML slots describing the soil
class and the biosmaple
will appear as columns in one DataHarmonizer interface. The two classes might both define a class with a shared name, so the code in this repo must supoort resolving differences in how those shared slots are defined (with as much automation as possible).
Another requirement is supporting institution or project-specific DataHarmoinzer columns. The requirements of EMSL are provided as the Soil-NMDC-Template_Compile Google Sheet. Requiremetns from JGI can be found at ???
Automatic conversion of one class from one LinkML file to a DataHarmonizer template:
poetry run linkml_to_dh_no_annotations \
--model_yaml ../mixs-source/model/schema/mixs.yaml \
--model_class soil \
--tsv_out target/data.tsv
Notes:
- this
data..tsv
file still needs to be placed into a DH tempalte folder and converted intodata.js
- the DH sections are arranged alphabetically, as are the collumns withing the sections. It is assumed that the tempate builder will want to reorder the sections at least, especially putting some "identifiers" section first, with the primary key column first within the section.
TODOS:
- explain other click options
- elaborate section and column ordering
- pattern tabulation seems broken
make soil_biosample_dh
expands to
poetry run linkml_to_dh_no_annotations \
--model_yaml interleaved.yaml \
--model_class interleaved_class \
--add_pattern_to_guidance
Managing permissible values in LinkML enumerations
This code was developed in support of the IARPA Felix project and enhances ??? from linkml-model-enrichment