FamilySearch/GEDCOM

Extension registry

Closed this issue · 7 comments

Now that we have a documented YAML format (https://gedcom.io/terms/format) I think it's time to revisit the idea of an extension registry.

Proposal: we allow community submissions of YAML files for extensions to a common repository where they may be easily located.

In addition to general interoperability gains, this might assist in an extension-to-standard workflow (#17) and in defining additional calendars (#38, #116) and events (#117).

Various questions I think we should answer before creating such a registry:

  1. Should it be part of this repo, the gedcom.io repo, or a different repo?

    Note that git hooks and github actions mean we can make this decision independently of if/where it has a web presence.

  2. How should submissions to the registry be managed? Options include

    1. Anyone can submit a pull request to the registry; a team (the steering committee or a different team) decides which to merge
    2. Tool authors first verify their ownership of a URI namespace; they can then submit definitions in that namespace only
    3. An open web form allows anyone to submit definitions; if they comply with the formatting rules they are accepted automatically
  3. How should files be named?

    Existing YAML files all have the same prefix so their filenames are easy to define. But that won't be true for extensions. Some options include

    1. <tag>.yaml
    2. <HEAD.SOUR-tag>.yaml
    3. incremental number: first submitted is 1.yaml, next is 2.yaml, and so on
    4. submitter chosen
    5. registry maintainer chosen
    6. file name based on URI (replacing /, ?, and # with other characters)
    7. directory tree based on URI (skipping scheme and replacing ?, and # with other characters)
    8. just one file with a list of all registry documents inside it
  4. What derivative files should be produced?

    1. Convert YAML to JSON, GEDC, XML
    2. TSV files like substructures.tsv and the others extracted from the standard
    3. Lists of all known URIs to use a given tag
    4. Lists of all known extensions to be produced by a given product
  5. Should the standardized concepts be included in the registry with the extension concepts?

  6. Should we create URIs in the extension tag registry namespace for extensions registered without a creator-defined URI?

My initial thoughts...

  1. Should it be part of this repo, the gedcom.io repo, or a different repo?

No strong opinion here, but if some extensions might also warrant an addition to https://github.com/FamilySearch/GEDCOM/blob/main/version-detection/version-detection.md then having them be in the same repo would make PRs easier to review.

  1. How should submissions to the registry be managed? Options include

    1. Anyone can submit a pull request to the registry; a team (the steering committee or a different team) decides which to merge

Yes

  1. Tool authors first verify their ownership of a URI namespace; they can then submit definitions in that namespace only

I would like to see a process for 3rd party submissions, especially if there are known things used by popular apps that no longer have an active owner. As one comparable, a URI scheme can be registered by a third-party (see process in https://www.rfc-editor.org/rfc/rfc7595) and there is a process to later claim ownership.

  1. An open web form allows anyone to submit definitions; if they comply with the formatting rules they are accepted automatically

A web form sounds like more work, to create, maintain, update, so I'd just start with github PRs for now.

  1. How should files be named?
    Existing YAML files all have the same prefix so their filenames are easy to define. But that won't be true for extensions. Some options include

    1. .yaml
    2. <HEAD.SOUR-tag).yaml

Above sounds good to me.

  1. incremental number: first submitted is 1.yaml, next is 2.yaml, and so on
  2. submitter chosen
  3. registry maintainer chosen
  4. file name based on URI (replacing /, ?, and # with other characters)

I'd like to allow legacy (5.5.1) extensions in the registry, and those won't have URIs per se.

  1. directory tree based on URI (skipping scheme and replacing ?, and # with other characters)

  2. just one file with a list of all registry documents inside it

  3. What derivative files should be produced?

    1. Convert YAML to JSON, GEDC, XML
    2. TSV files like substructures.tsv and the others extracted from the standard
    3. Lists of all known URIs to use a given tag
    4. Lists of all known extensions to be produced by a given product
  4. Should the standardized concepts be included in the registry with the extension concepts?

Yes, I'd put them in the same registry.

  1. Should we create URIs in the extension tag registry namespace for extensions registered without a creator-defined URI?

Offhand, I might say no, but it's ok if we find a good argument to do so.

I realize the question I'm asking is not really part of this issue thread but it has been bothering me since v7.0 was introduced.

Question:
How would the Extension Registry actually work when used by a specific application?

For example: The genealogy application receives a V7.0 GEDCOM from the wild, i.e. a friend sends me a GEDCOM and wants me to help them research some of the branch. My program imports the GEDCOM and comes across an "Extension Tag" that it does not understand or know how to use it within its database. What happens?

  • Is my program expected to reach out to the internet and look up the meaning of the tag at import time, and understand what to do with the unknown extension?
  • Is the user (me) expected to go to the internet after the import, look up the Extension in the Registry and try to get my copy of the program to understand the unknown tag?
  • Is the user (me) suppose to call the support desk for the software company and tell them the definition of the the unknown extension and hope that some day in the future that an update will support the extension.

How is this Extension Registry valuable to the import process?

How is this Extension Registry valuable to the import process?

I think of it primarily as an aid for developers. If a developer notices that a lot of files are being submitted with undocumented extension tag _XYZ or documented extension https://example.com/XYZ, the developer can check the registry to see what's known about that extension, including tips on where their code and/or database schema will have to change (via the superstructures field), how to parse the payload, descriptions of the extension's purpose and meaning, and suggested user interface text.

I expect it may also be used in some automated validators and compatibility checkers. I could imagine creating a tool that uses it directly to populate a dynamic user interface and automatically handle any extension in the registry, but I doubt that will be very common. Some tools also list tags they failed to import to the user, who could presumably use the registry to figure out what they all meant and thus exactly what data was lost, but I don't expect most users will do that.

Thanks. Based on what I've seen with various genealogy programs the amount of effort put into updating a database, interface and additional data, I suspect that the incorporation of new "Extensions" will be slow!

Personally I would have hoped that a new record type would have been added to the GEDCOM payload working like a "Data Dictionary", similar to what was introduced in GEDCOM v5.3 but as a separate structure which could help the import reconcile the extension quicker and or give the user a specific message that would give a definition of the extension rather than an simple error.

Discussed in steering committee

  • We plan to start with github pull requests to submit extensions
    • on a separate repository
  • We will allow third-party submissions, and flag them as such so that owners can overwrite them if they wish
  • File names
    • final decision with repository maintainer (we identified exceptions for most we discussed)
    • we will return to this point later
  • URIs
    • let owning submitter (but generally not third-party submitter) suggest a URI
      • we recommend <URL of documentation>#<tag name>
    • if no URI is supplied, we default to the URL of the yaml file in the registry
  • A shared registry for standard and extension tags will be simpler
    • and the g7:TAG URIs can be redirected to it

FamilySearch/GEDCOM-registries#11 updated the GEDCOM-registries repository to copy the extracted files there automatically.

This is now done