Hierarchical (or simply linked/inherited) metadata

Question

Hierarchical (or simply linked/inherited) metadata

Opened this issue a year ago · 2 comments

Hi! Thanks for the cool site. A few years ago, at a standards-unification conference for biological data, we looked at csv-on-the-web as something we should all adopt. A lot of our data was from repeated experiments, so we started talking about a hierarchical version, i.e. you would:

Create a directory (on a disk, at a URL, in a zip file etc.) with some meta data file indicating it was a special "resource"
Have a meta data file for this directory (e.g. saying "lab = ...") and further meta data files for subdirectories ("experiment type = ..", "cell_type = ...", "temperature = ..."), so that each subdirectory could either add to or overwrite parent directory meta data fields
Finally have the CSV meta data, which "inherits" all the data from the subdirectory it's stored in

Do you know if there have been any efforts like this? Or some other mechanism to achieve similar goals? (I.e. a field in the json that says "please also include all of the stuff at this URI")?

Thanks in advance, sorry for abusing the issue system.

Answer 1 · 2023-06-06T09:12:28.000Z

Or some other mechanism to achieve similar goals? (I.e. a field in the json that says "please also include all of the stuff at this URI")?

To make it a more general question: Is there any mechanism to import meta data from another document? (So not necessarily a tree structure)

Answer 2 · 2023-06-13T14:41:20.000Z

Hello!

The spec allows for object properties which can either be objects or references to URLs where the object definition may be found. This allows you to re-use metadata across tables, for example:

experiment-1.csv.json:

{
  "url": "experiment-1.csv",
   "tableSchema": "experiment-schema.json"
}

experiment-2.csv.json:

{
  "url": "experiment-2.csv",
  "tableSchema": "experiment-schema.json"
}

Or equivalently as a table group:

experiments.json:

{
  "tableSchema": "experiment-schema.json",
  "tables": [
    { "url": "experiment-1.csv" },
    { "url": "experiment-2.csv" }
  ]
}

The spec doesn't define how (or whether) metadata should be merged when the user provides overriding metadata but there are inherited properties which allow you to override column specification defaults provided at e.g. the table group level with values for a specific table.

Judging by your examples, you may be thinking about your own metadata properties and not ones from the CSVW metadata vocabulary. In which case you might be able to use provisions from the JSON-LD spec to achieve what you want. Just bear in mind that, despite the syntactic overlap, CSVW processors only need to support a subset of JSON-LD; crucially this means the context is fixed. You may need to have bespoke processing if you go down this route.

You might also think about a pre-processing tool that generates CSVW annotations. This is what we did with Swirrl/table2qb - we use a registry of columns to generate a table-schema suitable for a given CSV table. This let's us avoid repetition in the "source of truth" and still have spec-compliant outputs.

FWIW, I don't think this is really an abuse of the issues system. If you come up with a CSVW solution it'd be great to hear about on this thread and maybe even see it become a guide for csvw.org so others can learn from the example.