USArmyResearchLab/ARL-Open-Source-Guidance-and-Instructions

Source history, commit messages, and author email address

spyhunter99 opened this issue · 4 comments

Sorry, lots of topics to cover in this one...

What is ARL's policy on git/svn history, commit messages and author email addresses? Does it get published along with the release process or omitted/obfuscated? (do reviewers even know what this means and does it imply that the source history also has to be examined by reviewers?)

Say there was some .mil project that was being open sourced. Does the source history travel with it? Say the source history need to be rewritten to obfuscate or remove something. How does this apply/effect the freedom of information act?

For git, email addresses are required for commit messages. Github requires a real email address, but for obvious reasons, commits with real .mil address may need to be obfuscated during the release. What is ARL's open source policy on this? Personal, fake, or official email addresses for commit messages? What about the committers name? real, fake, obfuscated? initials?

Often, many programing languages that support documentation have the ability for the author to be declared in the document, such as Javadocs @author tag. Again, real, fake, obfuscated or omitted?

Just to address one issue, I would assume handling author identity and contact info would be no different than if we were talking about a research article in a public journal, where author name, email, affiliation, institution address, etc, are published with the document without issue. So any issue with that should be covered by the normal public release approval and secyrity review process.

@spyhunter99 To answer your questions/comments in order:

  1. @NRJank is correct about how we're handling commit messages and author identity. In general, we're treating code kind of like an article that is being amended continuously. Since papers that are approved for public release also have the author(s) name(s) attached, additional commits are treated the same way.

  2. Everything that is published has to be OPSEC reviewed. That means that source history has to be reviewed. As a practical matter, if a project has a very long history and it isn't feasible to look through all of the history it may be necessary to simple dispose of the history and start with a clean repository.

  3. I'm not a lawyer, so my comments about FOIA may be wrong. That said, my personal understanding is that source code is not considered a Government Record, and therefore not subject to FOIA requests. Thus, disposing of the history will not violate FOIA. (I'll be passing this question along to the ARL lawyers, so there may be an update to this comment later on).

  4. ARL's policy is to use the author's real name and real (Government) email address.

Part of the purpose of ARL's Open Source policy is to encourage collaboration with the public at large on projects that ARL has initiated. That would be defeated by obfuscating or omitting contact information.

Reference: he is a lot more discussion (and flaming?) on the FOIA / git history topic

https://github.com/deptofdefense/code.mil/issues/67

read that and make your own opinions.


A quick note on disposing of history to speed up OPSEC reviews. Many people are going to just delete all old code because it is hard enough to do OPSEC review for one version. If practical issues do not allow reviewing all history, you may consider using "abbreviated history". For this you take just review the releases (the major ones, the minor ones, or all point release). Create a new git repository and load each of these versions in as a single commit with the appropriate date.

@fulldecent You're right. And just like the discussion at deptofdefense/code.mil#67 points out, the Government may be required to keep copies of old history around as a record. That said, the publicly published repo could have an abbreviated history, which will be easier to OPSEC review.