FamilySearch/gedcomx

Add Race as a known fact type

Closed this issue · 12 comments

Ethnicity is supported, but not race. Ethnicity and race are not the same thing.

👍

Race is not objectively measurable. Further, at how many generations after a 'cross' is a line back to 'pure'? what blood quantum qualifies, and from what zero point? According to geneticists there is no validity in 'race' classification schema, or even morphological.

(That is, this is bad data. Don't use it.)

Yes but historically (and even some modern) documents frequently ask for race. We need a way to share that information.

Agree with Justin -- the ability to document a person's race as it appeared in historical documents is really important, especially in the US where we have such a messed up history trying to fit people into categories. Being able to document in a GEDCOM what historical documents said does not imply that one believes that the construct is all that useful from a genetic point of view.

Case in point: by 2016 standards, my husband's Race is white but his Ethnicity is half-Ashkenazi Jewish and half-Sephardic Jewish. Some of his Sephardic half of the family, originally from the island of Rhodes in the Aegean Sea, lived in Portland, Oregon in the 1920's and 1930's. The state had a significant Klan presence at that time. This is probably why his family was legally enumerated as "Octoroon" on the 1920 US Census, an archaic racial word from the US South that used to have a specific legal meaning of "one eighth African-American" which was not even supposed to be a legal option to choose on that 1920 census. The family was then racially enumerated as "Turkish" in 1930, which was at least more accurate (although by that year their original hometown had become Italian under Mussolini) but once again was not supposed to be a valid legal racial category on that census. And then in 1940 they were legally considered white. ¯_(ツ)_/¯

In other words, this data is messy, often has painful associations, and definitely worth documenting for historical reasons, even if it is sometimes totally orthogonal to ethnicity, religion, language, nationality, or actual genes.

We also used to assume:

  • Sex is immutable
  • All children in a family are genetically descendants of the two adults
  • All wives take their husband's last name

Which has resulted in tens of thousands of horribly wrong GED files floating around the internet. Bad data is bad data; it doesn't matter if it was historically collected or not. GED is not about retaining historic records, but about building records of history.

Use good data. Encourage good data models which inherently resist subjective interpretation.

Various documents list my family's surname spelled in various ways in at least three languages. I can and should document those in my GEDCOM, even if none of them is the modern-day American spelling.

Various documents list my great-grandmother's age and implied estimated birth year as various numbers and years. I can document those in my GEDCOM even if I know those are wrong (because I've seen her birth certificate) and I know she eventually shaved years off her age. But her lies affected many historical documents, including her death certificate, so shouldn't I document them too? I'm not subscribing to her revisionism, I'm just documenting that it existed.

Various documents list my husband's family's race in various ways, even though as with the "Octoroon" example above this was totally dependent on local factors and had no relation to the truth. Why shouldn't I be able to document that data, too? Saying that something existed in a historical document is not the same thing as believing it or subscribing to its accuracy or relevancy.

Or to put it another way, how would you suggest that we document in a GEDCOM what a census record may have stated about someone's race?

Use good data. Encourage good data models which inherently resist subjective interpretation.

Yup. All of it, including that of distinctions you don't agree should be made.

Those bad GEDCOMs floating around the internet come not because people included evidence of race, ethnicity, or tribe. They come because people made invalid assumptions and didn't exhaustively search out all of the evidence, carefully analyze all of the evidence, and failed to resolve all of the conflicts.

Regardless of what we believe now about the nature of race, ethnicity, and tribe, our ancestors made records containing those distinctions and ignoring them now leads to incorrect conclusions and those bad GEDCOMs.

Source, not fact.

That is, it is documented as a census event (http://gedcomx.org/Census), which is supported by the census source, which includes the textual description of 'race'. This also reduces redundancy within the ged (normalizing /Race against each /Census which includes a race value.)

You are perhaps getting confused by applying dictionary definitions to our field names.

The source is the document. The event captures the place, time, and action that the document records. The facts include the characteristics of the person (or persona, if you like) found in the document. We evaluate those "facts" in the context of the document and the legal and social environment in which the document was created to distinguish the person described from everyone else in that historical setting in order to reliably combine the record with other records about that historical person and to create a narrative about his or her life and family.

Yes, I may be confused by mis-applying common language to your standard's jargon.

As I understood the standard:

  • A person or relationship object exists
    • may be association with one or more relationship
    • may be associated with one or more event
    • may have one or more fact
    • may be associated one or more source or document
  • An event object exists
    • may be associated to one or more person or relationship
    • may be associated to one or more source or document
  • A source may be primary, secondary, or questionable evidence.
    • must include citation
  • A document may be primary, secondary, or questionable evidence.
    • must include text value
  • A fact is a given value of a known type, a sort of object property.
    • is dependent to a _person_or a relationship
    • may have one or more supporting source
    • may be associated with one or more document
    • meets fact-types-specification.md#3. Criteria for New Fact Types

But, in fact, some of this appears to be wrong. SourceDescription and Document top-level data objects are confusing me a lot, and I do not see where they are directly associated with a person, relationship, or event. Even so the documentation objects can clearly accommodate recording a historic classification without applying it as a personal fact: e.g. follower of Islam vs. Infidel in Moorish Spain (a taxation classification.)

However, I believe the Criteria for New Fact Types does address the issues of ambiguity I raised.

@justincy Re: referred to as _____

Not sure how best to add the data to the model but, one possibility would be identity variables. Specifically identifiedAs and selfidentifiedAs. Use case example: individual immigrating to the USA identified as Swedish, self-identified as Norwegian (then part of Sweden,) later identified as Chinese (due to Sami phenotypic expression of epicanthic folds) and German (due to language misunderstanding.)

Okay, so activity seems to have settled on this thread. I'm not sure I've accurately distilled all the noise, but I think that the generally-accepted conclusion is that Race should be added to the spec as a known fact type because it meets the Criteria for New Fact Types:

  • It clearly demonstrates a useful purpose within the context of genealogical research because the term is used in historical genealogical documents.
  • It does not overlap the definition of any existing fact type.
  • It demonstrates reasonable applicability; the data on a historical document needs to be captured.
  • It demonstrates reasonable applicability across differing geographic, cultural, and regional contexts.

@Amgine0's points about how our understanding of the Race concept (among others) has evolved over the years are acknowledged and should be taken into consideration by developers when making choices about how their particular genealogical application should be implemented. However, the intention of the spec is to facilitate the sharing of genealogical data, including data that has been captured from historical documents which declare Race. How applications handle that data is beyond the scope of the spec.

I have submitted #297 for your consideration. I'll give a few more days after any further discussion settles before I merge.