pha4ge/hAMRonization

Decide on a single 'authoritiative' schema format

dfornika opened this issue · 5 comments

There are several schema definition technologies available to us:

  1. JSON Schema
  2. SALAD
  3. AVRO
  4. JSON-LD

Ideally we would have a single 'authoritative' schema, and any other schema could be automatically derived from it. Which schema definition technology would make the most sense to use as the authoritative schema? Would it be possible to derive all the others from it in a robust and automated way?

Validation using schemas temporarily disabled while they get updated.

@cimendes do you have a preferred schema to use as base? Considering JSON output is provided shall we just use JSON schema for now?

I've discussed this a bit with @dfornika and it would be great if the the other schemas could be generated from a single "rule them all" schema, namely the SALAD one. I've trying to do some research and I think it is possible to have the SALAD schema and then convert to the other ones, including JSON. Unfortunately I haven't had the time to give this the propper attention it deserves... I'm sorry.

To me we can follow two paths:

  • Focus efforts on the SALAD schema implementation and use that to generate all others
  • Implement JSON schema validation (our bread and butter from the SARS-CoV-2 project)

The time commitment necessary for each if very different, and there are certain advantages of using one over the other, such as ontology integration.

I would like to hear your input on this, @fmaguire

I'm not too bothered, why don't we do a quick JSON schema and if we need other features later we can revisit? Seems the quick pragmatic option given that only a tiny fraction of users are likely to even touch the schema directly.

Closing issue as we're going for the good old JSON format