FamilySearch/GEDCOM

What is the meaning when a structure's name/label does not equal its defintion/description?

tychonievich opened this issue · 10 comments

Most structures in the current spec have both a short word-or-two name or label (the "name" column in sections 3.3.1–3; the section headers in section 3.3.4) and a description (the "description" column in sections 3.3.1–3; the text beneath the section headers in 3.3.4). Some also have a different label and different description for the structure and its substructures in section 3.2.

In some cases these do not fully align. As a non-exhaustive selection of recent examples,

  • #314 discusses how g7:NATI's name is "nationality" but it's description is broader than just nationality.
  • #473 includes many topics, including various thoughts about the relative importance of the label vs the description of g7:NICK that led to #482
  • #504 describes a specific "official public notice given that 2 people intend to marry" that is not a "marriage bann." Some comments seem to suggest these should use g7:MARB, others that they should not use g7:MARB.

How should these and other similar issues be handled?

Some options I've considered:

  1. Define "and" semantics. The only correct use of a structure agreed with all of its various labels, names, and descriptions. Any other use will appear incorrect in some context.
  2. Define "or" semantics. Data might have been entered by someone seeing any one of the definitions in issolation, so asserting which one is being followed is misleading.
  3. Define "description supersedes label." A few-word label cannot provide the nuance and clarity that longer text can. If the label and description appear to be at odds, the description is correct.
  4. Treat each such case as an ambiguity to be patched. The resolution of the ambiguity could align with any of the three approaches noted above.

Resolved PRs:

  • in 7.0.0 we picked the narrower definition of NATI from 5.5.1; in #187 we undid that in the description, using the broader definition instead, without changing the label; we revisited this in #319 further leaning into the broader description.
  • in #242 we changed the description "there has been a claim by some that this child does not belong to this family, but the linkage has been proven" to match the broader label "proven."
  • in #311 we changed the label "burial" to match the broader definition "Depositing the mortal remains of a deceased person."
  • in #482 we acknowledged conflicting definitions without changing either label or description, instead adding a note pointing out that they differed and that that leads to difficulty in interpreting extant data.

Others I think might be problems:

  • g7:ADOP has a broader label "adoption" than its description (the description adds in legal approvals)
  • g7:CHR and g7:CHRA have broader definitions (with no reference to specific religions) than their labels "christening" (implying christianity)
  • g7:DIV has a broader label "divorce" than its description (which requires civil action)
  • g7:ENGA has a mismatch: the label "engagement" would apply to secret engagements too but not to a later announcement of a secret engagementt
  • g7:MARB has a broader definition of than its label "marriage bann" which is just one type of official announcement of forthcoming marriage
  • g7:MARC has a mismatch: "marriage contract"s might include the elements listed in the description, but do not need to do so; and the elements listed in the description could occur separately from the marriage contract.
  • g7:CAST has a broader definition than its label "caste", including many forms of rank and privilege other than caste
  • g7:INDO has a broader definition than its label "identifying number", allowing for non-numeric identifiers too.
  • g7:AUTH has a broader definition than its label "author" including non-authorship roles in contributing to a a work.
  • g7:CALN has a broader definition than its label "call number", allowing for non-numeric identifiers too.

I need some time to digest these, and unfortunately I’m not at home and have very limited internet connection. All of these tags are examples of my continued frustration with the original GEDCOM bias toward a very narrow view of genealogical data recording. GEDCOM is at a crossroad, do we continue with the current path based on prior bias and simply add new tags to become more inclusive of newly “discovered” terms and conditions that will forever pop up? OR Do we step back analyze what “kind” of items we are recording bring together similar terms with an eye toward the fact that every culture/religion/government has similar, but not exactly the same, functions and creating recording tags that define a general concept with attributes that more specially define the concept.

For those of you that understand the basic concept of Object Oriented Programming, where a general “class” does not define an instance until the attributes of that class are defined. We know what a car class is but it is not instantiated until we set the attributes to indicate its make, year,color! The same can be true for any of the current tags that have narrow definitions but broader interpretations when culture/language/religion/government get involved! Right now our classes are not named “car”, rather they are called “yellow 1995 chevy”!

Throw out the old tag that is too restrictive, develope a new tag not based on a word that carries a definition of its own, and replace it with a “class tag” containing a series of similar concepts and strong definitions. If Adoption has multiple interpretations create a tag call PIP205 with like kind subterms that are specific to a particular culture. But I’m sure this is too radical!

Why do we have two tags one for social security number and another for national identity number? They are similar just used by different governments!

I agree with the direction you suggest, @Norwegian-Sardines. I also am mindful that we should continue to maintain 7.0 even after replacing part (or all) of it with better systems, so even if an improved system bypasses this question in the future I think we'll still need to answer it for 7.0

More examples:

  • GIVN has a broader label "given name" than it's descriptive text which adds on "used for official identification of a person", a characterization that not all given names share
  • SURN has a broader description that allows any name passed on by family members, which would seem to include the entire name of families that name children after their parents and grandparents and so on, but that is not included its label "surname"

Discussed in steering committee 9 JUL 2024

  • Some users interpret structures narrowly and want many specific variants; some users interpret structures broadly and want few of them.
  • The specific variant option makes it challenging to define, distinguish, implement, and help users find them; fewer broader structures that cover groups of related event types would be better.
    • But we should have the option of more specific subtypes, either with current TYPE or an enumerated KIND substructure instead.
    • This suggests we should draft a proposal for the next version of the standard with a smaller set of structure types
  • When parts of the spec describing the same component do not align,
    • Clarify so they do not conflict
    • Clarify toward broader definition or union of cases
    • Do not try to enumerate all specifics; general categories, possibly described with some examples, except in cases where tags overlap and require specificity to be distinct

In cases were the tag means several thing where a strict interpretation of the word is normally taken, we could use the strict definition ,but add the caviot that others may interpret the tag more loosely “for example …” which can be used here and will be addressed more completely at a later time!

If you and “the committee” like the idea of fact.KIND one place to incorporate the concept without too much hand wringing would be when adding a new tag.

For instance a “military” tag: We have seen requests for multiple new tags based around military enlistment, military war action/participation.

We could develop a list of military related events and fact (I think we have that already and I probably commented on it as well) and put the concept into v7.1 with the idea that others will follow as we find them.

Since I’ve already used fact.TYPE for years to add meaning to the current set of tags, for example the MARR.TYPE tags contains “common-law”, “civil”, “religious”. We would need to discuss how users like myself get from using TYPE to KIND!

One of the things that I thought we were doing by expanding the use case of the current tags with expanded definitions was to provide applications a means to continue to use the old design of GEDCOM by bringing together the various “custom tags” back into the mainstream GEDCOM tag set!

If this was not the intention of these expanded definitions and by including them cause ‘’hand wringing” amongst the committee or the software industry than we should rethink expanding the definitions that are too far away from the definition of the actual tag!

Personally, I would be against changing back to a more precise definition of the tag word, but if the majority wants that then it should be so.

We should however be mindful that this is a genealogical implementation and any definition of the tag must only include concepts that are genealogical and not include outside influences not genealogical in nature.

For example: while I agree that “nickname” has definitions used outside of genealogy such as those used in IT for signon or screen name identification, that definition should not “color” the definition we use in genealogy to define what GEDCOM understands a nickname to be.

If a value for a tag can include any alpha-numeric and special characters then we need to remove the word “number” from the definition and say “value”. The INDO tag could be a drivers license and these could include letters and dashes. Same with CALN, library call numbers can include letters, why do us librarians call them call numbers even when they include a letter, old term new use case? We don’t let it bother us!

I understand that GEDCOM originated from a American Church to standardize historical American records, so all the usecases for conservative American structures are well supported. This results in many specific tags that could easily be classified as one.

For example ANGA, MARB, MARC, MARL, MARS, DIV, DIVF, MARR and finaly a EVEN to have a placeholder for all exceptions. All to answer: Is there a bond between two people (or has it ended).
Or BAPL, CONL, INIL, ENDL, SLGC and SLGS. Which are specific to a single church. Excluding similar event from other religions.

I think it would be good to depreciate specific TAGS and make them more general. But I don't mind if the general tag does not match it's new purpose. As long as it matches intention. Use INDI for a person, FAM for a group and MARR for a bond. SPSE for the top Hierarchy and CHIL for the bottom.

And let application-builders build modules to accommodate cultural differences, but add tags that help cultural structures. And try to look at function, not at definition first. NICK is fine for daily names.