rootsdev/genscrape

Where should SourceDescriptions be attached?

Closed this issue · 2 comments

In working on #33 I see that we need to be more deliberate about where we attach SourceDescriptions. FamilySearch generates SourceDescriptions on their own and have them attached to the root GEDCOM X element. For all other sites we have been generating one SourceDescription and attaching it to all persons and relationships in the document. We did that before realizing that the root element had a description property.

I believe we should use the root level description property. But should we continue attaching that same SourceDescription to all persons and relationships?

Lets first consider the case of your typical census record. Say you have a couple and three children. genscrape is used to extract the data when viewing the mother's record which includes a household table that points to all other members of the household but only include their name and age. Should all other household members be linked to that same source? That's not likely what you want because they have their own records.

But in the case of a marriage record, depending on how the record repository displays the data, the parents of the couple might not have their own records in which case you would want all persons and relationships to have the root SourceDescription attached to them.

Online trees are similar to the first case of a typical census record. If you attached the root SourceDescription to all persons and used genscrape to process all persons then you would end up with all persons having one SourceDescription for all persons they're related to. That's not what we want.

We have a few choices:

  1. Never attach the root SourceDescription to any other entities in the document. Consider it implicit that it also applies to everything in the document in varying degrees and expect apps to interpret as they please.
  2. Always attach the root SourceDescription to all entities in the document which support sources.
  3. Have genscrape site processors attach the root SourceDescription to none, some, or all entities based what is deemed most appropriate for the record type.

I'm leaning towards option 1.

I still believe that option 1 from above is ideal but it's a breaking change. Currently we are generating a root SourceDescription but we're not attaching it to the root document while we are attaching it to all nested entities (persons, relationships, etc). So we can fix this, for now, by attaching that SourceDescription to the root too.