FamilySearch/GEDCOM

All Record Types and "Inline Notes" should include a RESN tag.

Closed this issue · 22 comments

Any Record Type or Personal Datapoint that could potentially contain sensitive data should provide for restrictions to data sharing! At present only the Family_Record, Individual_Record and Multimedia_Record have the g7:RESN tag type, and any Event_Detail.

Because Source_Records, Shared_Note_Records, Repository_Records as well as "Inline Notes" can contain sensitive data these record types and structures should also include the g7:RESN tag.

It should be considered whether RESN should have a special role within the GEDCOM structure, similar to CONT.

With the increasing use of web-based genealogy applications, more and more people are working on shared family trees and family trees are shown to a lot of visitors. In such cases the RESN tag is essential for compliance with data protection regulations and for the protection of privacy.

Proposal: A RESN tag should generally be used to protect the hierarchical structure above it. A ‚1 RESN‘ protects a complete record, a ‚2 RESN‘ protects a tag at level 1, a ‚3 RESN’ protects a tag at level 2 and so on. In theory, parts of a NOTE structure could also be marked as private with a RESN tag (INDI.DEAT.NOTE.CONT.RESN).

Furthermore, the list of enumerations of the RESN tag should be adapted to the needs of web-based genealogy applications. For example, NONE to indicate records on a public individual, who is alive.

As you might have seen, I just added something like this for NOTE's, see #497
The privacy problem inside a NOTE, in 2 Dutch programs, is dealt with, by adding special characters Inside a NOTE to be able to protect part of the text in that NOTE.

Thats done like that for many years, because there was no proper solution for it.

My opinion:
Dont make part of a NOTE private. That might complicate the NOTE processing. I say that, even when I have used that for many years. But that was only because those programs could only have 1 NOTE at certain places, and not many NOTE's as the GEDCOM says. So the only possibility we had, was to make part of that 1 NOTE private, by surrounding the private part with special characters.
But that might complicate the NOTE processing.
If a user wants part of a NOTE to be private, he/she should move that to a separate NOTE and make that NOTE private.
To me that seems a cleaner way of doing.

One of the programs I used had another setting:
You could (temporarely) set a mark to overrule the privacy, in case a person had given permission to publish under certain circumstances, like publishing a family book just for family members:
image

"privacyfilter staat uit" means "privacy filter is off" (this is a program wide setting)
"uitlsuiten van publicatie" means "dont publish"
"toestemming tot publicatie verleend" means "permission granted to publish"
Dont know if that kind of thing could be added to RESN.

I would agree the marking sections of a NOTE as private with other parts of the NOTE being unrestricted is not a good idea.

Hopefully the software mentioned now supports multiple NOTE tags.

@Norwegian-Sardines I have no idea, they are working on it for 3 years now and the users have no idea how things are gonna look or what is going to be implemented.

The new program I use was very open during the whole implementation process. And after a certain point you were invited to work with what they had till that point and give comments, and tell problems. With the restriction to not use it for your real tree, until the official release, june 8 this month.

I would agree the marking sections of a NOTE as private with other parts of the NOTE being unrestricted is not a good idea.

The main intention of the proposal was not to protect parts of a NOTE. The example was only intended to show how a RESN tag could affect higher-level structures.
A better example would be the suppression of the cause of death if, for example, this should not be made available to the general public in the event of a suicide (INDI.DEAT.CAUS.RESN).

I agree that subtags of facts should have a way to restrict access, not only for online display but even for reports, charts and transfer by GEDCOM!

I suspect that we would get a lot of “push-back” for a proposal to do that as many programs would have serious design and processing issues supporting data privacy at that level.

I could see someone, sometime wondering why only CAUS (cause) was restricted, why not DATE, PLAC, etc. and give a good example of one reason to restrict the access or display. It might cause intense pain to a family to see the place or date of a members death, or that a sad event happened on an otherwise happy day!

I suspect that we would get a lot of “push-back” for a proposal to do that as many programs would have serious design and processing issues supporting data privacy at that level.

[rant start]
1: Every program is free to implement whatever it wants from the GEDCOM, if it does not like something it can skip it. Attention! I dont say they should but they can and they do!

2: Laws are more and more changing because of privacy issues. So if a program does not want to support that, when GEDCOM makes it possible, so "Good luck" to those who think they know better!

3: Many programs have their own special kind of input and output, in whatever database format they have choosen. I only know of 2 programs that have input and output solely in GEDCOM: Ancestris and Family Historian. To me they are the only 2 that have a right of speech so to say, because they are the only ones that have to put each and everything from their design in a GEDCOM file. All others can (and do) put everything that does not fit in a NOTE or other tag that comes close, or they "invent" their own tags and throw their "very handy and usefull and free text" stuff in there, and dont care at all if it does not follow the GEDCOM standard! So the whole interchanging function of GEDCOM becomes a mess because of that. Yes I know that, because in Ancestris at this moment, there exist around 20 different import routines, to be able to help the poor users, who think they have a standard GEDCOM, to import that "general GEDCOM" files into the program! So where is the GEDCOM standard there then?

Sorry for that but I am really angry about it, we have so many users thinking they have standard GEDCOM output, told by their old program "we produce standard GEDCOM" and running into trouble trying to import it in a program that does its utmost to follow the GEDCOM standard to the letter. Getting the blame when it turns out the import does not succeed or only with hundreds of errors. So our developers have to program another special routine for input from another "great" program to clean up that mess and make it a real standard GEDCOM.

If programs have problems in their design its not the GEDCOMs fault, but theirs!

[rant end]

Hope I am not kicked out now, but this is some real experience from real situations.

@Norwegian-Sardines My last post was not meant in a bad way or because i am angry at you (wouldnot dare to), so it has nothing to do with you personally, but your sentence triggered me. Because I was thinking today about people not respecting the GEDCOM but do having comments and complaints, and thought maybe start another tread about that, but I didnot really dared to do that. And I saw your remark in another post #455 also a kind of rant :) So when I read that sentence I reacted on that.

Many programs have their own special kind of input and output, in whatever database format they have choosen. I only know of 2 programs that have input and output solely in GEDCOM: Ancestris and Family Historian.

I don’t use Ancestris, but Family Historian although very good at supporting GEDCOM v5.5.1 is still not perfect, it does add its own variations of GEDCOM, in particular stuctures based around Sourcing and Places.

The software I use as my primary application supports v5.5.1 very well, missing no subtags found in GEDCOM. It would probably enjoy having GEDCOM include some support for subtag privacy constrains.

Proposal: A RESN tag should generally be used to protect the hierarchical structure above it. A ‚1 RESN‘ protects a complete record, a ‚2 RESN‘ protects a tag at level 1, a ‚3 RESN’ protects a tag at level 2 and so on. In theory, parts of a NOTE structure could also be marked as private with a RESN tag (INDI.DEAT.NOTE.CONT.RESN).

Allowing RESN in arbitrary locations would create a lot of difficult cases. For example:

  • INDI.NAME.GIVN.RESN would hide data that is visible in INDI.NAME.
  • PLAC.MAP.LATI.RESN would hide the latitude, but not the longitude of a place.
  • OBJE.FILE.FORM.RESN would hide the type of a media file, but still allow it to be shown.
  • NOTE.LANG.RESN would let you view a note, but not know the language in which it was written.

CONT is a pseudo-structure to allow newlines to be serialised.

This comment only contains context information; I'll post my thoughts about the specific proposals separately.


RESN is odd. GEDCOM has

  • Genealogical assertion data (such as INDI, MARR, DATE)
  • Genealogical research data (such as SOUR, NOTE, REPO)
  • File and user metadata (such as HEAD, CHAN, SUBM)
  • Directions on how the file should be used by others (only RESN)

RESN is hard to describe in a way that makes sense. It supports exporting information to a file with a marker that that information should not be exported to a file. Why does that exist? I suppose the answer is "so I can share data with trusted people while telling them how to censor it if they want to share it with untrusted people," but that feels like a much bigger task than I trust RESN, or any reasonable extension of RESN, to correctly handle.

In the 7.0 drafting committee we couldn't come to any agreement on the purpose of RESN or a criterion for where it should be included. Some people loved it and wanted it many more places. Some people hated it and wanted it gone. Some people thought its values should be replaced with some kind of more literate assertion set describing who the data could be shared with under what conditions. As consensus was not forthcoming, we decided to move forward with minimal changes: we (a) clarified what happens when you remove a a structure (because of RESN or other reasons), (b) marked enumerated values consistent in their structure, and (c) added RESN to OBJE records because we found applications that already used it there.

For context,

Version Locations of RESN
5.4 and 5.5 INDI
5.5.1 INDI, FAM, events
7.0 INDI, FAM, events, OBJE record

This comment only contains context information; I'll post my thoughts about the specific proposals separately.

Can you point a link to the topic you are gonna write here?

Here's what I see as the motivating principles behind the various proposals above.

  1. Anywhere sensitive data could appear should be within the scope of something that can be marked with a RESN. Hence, all records should have RESN.
  2. Some substructures "commonly" have more sensitive data than the rest of their superstructure. RESN should have granularity to handle these. Hence, NOTE should gain RESN and events should keep it.
  3. Any structure that could "reasonably" have more sensitive data than the rest of their superstructure should allow that fact to be recorded in a RESN. CAUS, NAME, DATE, and so on included.
  4. Many things could be sensitive and long lists of special cases are problematic to implement and maintain. Hence RESN should be a universal substructure, allowable under any structure.
  5. We should not add RESN where it potentially creates ambiguous or conflicting information. Hence partial-information substructures like GIVN, LATI, FILE, and LANG should not have RESN.
  6. RESN isn't genealogical data, it's metadata used to decide what to include when serializing the data. That makes it more like CONT than like a regular structure.

If I've missed or mis-represented a motivating principle, please let me know.


Here's my take on these motivations.

1 (record.RESN) --- I can buy this; I don't anticipate using REPO.RESN, but I can see the value in having it for special cases.

2 (NOTE.RESN) --- I get the motivation for common cases, but expect strong disagreements about what counts as "common". Not my first choice, but I could be persuaded.

3 (many.RESN) --- I think the implementation complexity is greater than the added utility warrants for most applications. I hope some applications do this, but think putting it in the standard would convey a false impression about its likelihood of support.

4 (any.RESN) --- This would make some applications happy (those with a hierarchical or object-oriented internal structure) but others unhappy (those with a relational or less-GEDCOM-like internal structure). I don't think it's a net win.

5 (not partial-data RESN) --- Nonsensical and self-contradictory data is already permitted in various ways, so I'm not very worried about this.

6 (RESN as metadata) --- I'm intrigued by this, though I'd want to think it over some more; but in the end I think this runs into the same problem as 3: too much implementation lift for too little result. That said, if we instead made RESN a serialization superstructure instead of substructure of private data then applications that chose not to support RESN would naturally have a "safe" behavior of skipping the unknown structures. Still probably too big a change for too little benefit, but if we really want "RESN everywhere" I'd want to explore this option further.

I'll try to comment:

First,
I think your ideas are great. As for now a lot of programs, asked for by their users, have already implemented many ways of restricting their data, It is way better if GEDCOM would provide a general solution for that, as you have given above.

Second:
You said " It supports exporting information to a file with a marker that that information should not be exported to a file. Why does that exist?" Simple, lots of people have many questions about the safety of their data: "can other people look at my data?" "Can I export only xxxx to my family because of privacy?" "how do i restrict the data I want on a website?"
(real life user questions asked over and over)
So it is not about exporting the file the normal daily way, so for use of the person who creates his/hers tree. The markers are not for that. They are for the few times, that a user want to only export what he/she thinks others should see. Or what he/she thinks should be visible on the web.
The programs I use, both only have their data locally on someones harddisk, not on a website. Only when that person, after doing a lot of work on his tree, decides its time to show that work, only then its necessary.
But to be able to do that, those markers are needed.
Its the responsability of the software how to handle those markers.

Comment on your points:

1 (record.RESN) --- A repository can be your grandpa, of which you want no information to be public. (no info about your grandpa himself I mean.

2 (NOTE.RESN) --- There could be NOTE's describing someone was convicted for murder, or (in my tree) someone has done wrong things in the last war. So we already have the possibilty to write more NOTE's at a certain place. That means we can write the "non sensitive" info in the other NOTE's and have 1 NOTE with the sensitive data marked with a RESN.

3 (many.RESN) ---
4 (any.RESN) ---
Maybe we should think of a list of usefull places. I am not the GEDCOM expert you all are, but making a list, could give a better idea of where it could be usefull.

The others I dont really know. Think thats my lack of knowledge of GEDCOM (I am learning but not as quick as I want). What do you mean by "a serialization superstructure".
Also I seem to have problems with what you call "above" for restrictinng something.
As GEDCOM can be seen as a layerd structure, where each level jumps to the right, "below" to me, would mean everything inside that level end levels further to the right, from the marked TAG. But I guess thats my logic. :)

Last thing:
I started working on a list of every "extension" (I call them _TAG's) I could find on the web. I knew I ahd seen a list of that long ago. And indeed I found a lot of those!
When we would have that, we would be able to see what things seemed to be missing, and what problems were dealt with and in what ways.
That might help in deciding what problems others tried to "solve" because they were missing it in the GEDCOM.
That certainly does not mean everything from that list should be in GEDCOM, but it would give an idea of what we talk about.
No idea if you think that might be usefull.

Just a random thought...

It is often the case that a user will want to restrict all instances of a particular tag. e.g. INDI.SSN or INDI.DEAT.CAUS.

Is there a case for specifying these "global" privacy restrictions. Perhaps something like this in the HEAD?

0 HEAD
...
1 RESN privacy
2 FOR INDI.SSN
1 RESN confidential
2 FOR INDI.DEAT.CAUS

I think of something else too:
The Dutch program I used, had the possibility for all INDI's to add a tag called "Permission granted". That is mandatory by our country laws for privacy in case private information is published by someone other then the owner of that information.

So I think RESN needs something added like "PERMISSION" (or Granted or any other word) denoting the user of the tree had written permission to show this special private information in a public tree on the web for instance.

The Dutch program I used, had the possibility for all INDI's to add a tag called "Permission granted".

My own application has something similar. It adds none as one of the RESN values.

1 RESN none indicates that this record has no restrictions and is public.

/edit - this was previously discussed in #220

@fisharebest
How about:

0 HEAD
...
1 RESN CONFIDENTIAL
2 AGE 100 years
2 FOR INDI.DEAT.CAUS

So all INDI.DEAT.CAUS that are over 100 years have no privacy assigned anymore.

My own application has something similar. It adds none as one of the RESN values.

Thats something different. It looks NON private then.
I mean it should show there has been permission granted for it.

If that is missing people might get comments from others that they show private information.
With this tag it is obvious there IS permission to do so.

addition:
The owner of information on the web or a published book, is always responsible for the information inside it.
If that information has a tag PERMISSION, the software putting that information on the web, can publish a general message for that note or other info, something like: permission granted.
And then, if someone finds that private info they can turn to the user putting it on the web, for further questions.

But in case of NONE, it can also be, there never has been any privacy for that piece of info. Software finding that dont know how to handle it.

I was thinking (as I did in another post) to have for instance

1 RESN CONFIDENTIAL
...
1 RESN CONFIDENTIAL, PERMISSON

From that last example software can know that although this info might be private (for instance people born the last 100 years, or the cause of death etc) the user has the necessary permissions, so that software can handle this as non-private, and add that special message.
The rest is then up to the user. Not to the software.

This issue looks like a duplicate of #221; to clean up the issues tracker, further conversation should take place there