FamilySearch/GEDCOM.io

Defining compatibility without referencing specific features

Opened this issue · 4 comments

The current compatibility guide suggests compatibility with the specification is tied to supporting a wide set of features. I'd rather define it in terms of the alignment between whatever features an application supports and the files they read and write.

As a discussion proposal, perhaps we could define something like the following


The FamilySearch GEDCOM 7 specification contains more than 150 standard structure types appearing in more than 1000 contexts, and many family history applications use only a subset of them. Additionally, many applications implement features that are not (yet) part of the specification. Because of this, compatibility with the specification is dependent on the features implemented by a given application.

The following compatibility categories are defined.

Import Compliance
The application can successfully import any file that conforms to the specification.
Export Compliance
Every file exported by the application with a HEAD.GEDC.VERS is a valid file as defined by the identified version of the specification.
Import Coverage
For each component of the application's data model, if a standard structure in the imported file corresponds to that component then that component is set to match that structure during import.
Export Coverage
For each component of the application's data model, if a standard structure is available in the specification to represent that component then that structure is used to represent that component during export.
Import Transparency
The application alerts the user of any structures in an imported file that are not fully imported into the application's data model.
Export Transparency
The application alerts the user of any structures in the application's internal data that are not fully represented in the exported file. This is trivially achieved if the application has lossless exports.
Lossless Exports
The application loses no data if it (1) exports a FamilySearch GEDCOM 7 file, (2) clears its internal state, and then (3) imports the file it exported. Achieving this may entail the use of extension structures in the exported file.

Comments:

  • Import Compliance: what does "import" mean? If you silently read a file and throw pretty much everything away, is that importing or not?
  • Import Coverage: this can be hard to determine. If an app previously used extension structures in 5.5.1, then whether a standard structure new in 7.0 has the same semantics or subtly different ones can be non-obvious (take stillborn for one example that's already been discussed...). So if it continues to use extension structures in 7.0 for such cases, would it then not meet this category?
  • Export Coverage: if an app preserves imported extension structures, then does this mean it would fail this criteria if it didn't understand them enough to add standard structures?
  • Lossless Exports: what about also listing Lossless Imports as another category? Simple conversion utilities could be in that category.

I like the word coverage instead of compliance. Transparency implies a review of the software and how it communicates with the user. Likewise, I don't know how or what program would check Lossless Exports.

Discussed by steering committee. There was interest in this topic, but not yet consensus. Lossless imports seemed difficult to properly define. We noted that it would also be nice to include some kind of pre-purchase transparency, such that users could determine what specific structure coverage an application has without first using the application.

Below is a longer draft that I think addresses the above comments.


  • Non-violation: Does not violate the spec. A program that does nothing is trivially non-violative.

    • Import non-violation: Imposes no restrictions on files it imports beyond those in the spec itself. Can import any valid file, though it might not understand all parts of the file (see import coverage below).

    • Export non-violation: Every file exported by the application with a HEAD.GEDC.VERS is a valid file as defined by the identified version of the spec.

    • Meaning non-violation: Neither import nor export adds new information not previously present (other than metadata about the import or export itself).

      The most common cause for meaning violation comes from an inexact match between the spec and an application's internal data model. For example, consider an application that has a "religious rite" event type but does not have specific subtypes like baptisms. That application could import a BAPM as a religious rite, and could export a religious rite as an EVEN: those transformations lose some information, but do not add any spurious new information. That application could not export a religious rite as a BAPM, however: that would be adding a spurious assertion as to the type of rite it was.

  • Coverage: Uses all applicable parts of the spec. A program with a limited feature set has coverage if it implements the parts of the spec that relate to those features.

    • Import coverage: Every standard structure in an imported file is used to populate the appropriate internal state of the application except for structures representing data that the application lacks the ability to represent.

    • Export coverage: All of the application's internal state is represented in the exported file except for (a) state that the spec lacks standard structures to store and (b) state that the user has specifically requested the application not to export.

    • Extension coverage can be defined analogously on a per-extension basis, but cannot be defined for the unbounded set of "all extensions".

    In cases of inexact matches between the spec and the application's data model such that import or export of some state would result in loss of data specificity, an application can claim coverage if it ignores that data or if it converts it in a meaning non-violative way.

  • Transparent: informs the user of any potential data loss

    • Import transparency: If an imported file has structures (standard or extension) that are not being fully converted into the internal data of the application, the user is alerted to that fact and can access the list of such structures.

    • Export transparency: If an application has internal information that it will not fully represent in the exported file, the user is alerted to that fact and can access what information was not exported.

    • Feature transparency: Potential users can access a list of which standard structures defined in the spec the application supports and which it does not, and can do so without first purchasing, creating an account, or otherwise using the application. Including extensions the application supports in this list is recommended, but not required.

    Inexact matches between the spec and the application's data model results in something less than full conversion, and must be reported to qualify for transparency. Examples include importing a BAPM as a generic event or exporting some of the application's internal event types as generic EVEN structures.

  • Lossless: import/export cycles do not remove information

    • Lossless exports: The following sequence of steps does not lose data:

      1. Export a file from the application
      2. Clear all of the data from the application's internal state
      3. Import the exported file into the application

      It is expected that many applications will need to use extension structures to achieve lossless exports.

    • A specific file is losslessly imported if all of its data (in both standard and extension structures) is fully imported into the application's internal state; that is, if import transparency would have nothing to report. Because the set of extensions types is potentially limitless, and because preserving unknown structures as-is does not qualify as importing, applications cannot claim that all of their imports will be lossless.

      Preserving unknown structures as-is does not qualify as importing those structures because doing that can result in inconsistent data. Many structures require some kind of consistency with other structures: some structurally (like CHIL/FAMC pairs) and others semantically (like BIRT.DATE of all the spouses of a FAM being earlier than the FAM.MARR.DATE). Thus preserving some unknown structures as-is while editing other structures can result in inconsistent data and is not recommended.