golang/go

doc: require auto-generated DO NOT EDIT comment at top of file

rsc opened this issue · 13 comments

rsc commented

Issue #13560 defined the exact form for a comment marking a file as auto-generated.
As linked at https://golang.org/s/generatedcode and then repeated at
https://golang.org/cmd/go/#hdr-Generate_Go_files_by_processing_source,
the rule is:

Generated files are marked by a line of text that matches the regular expression, in Go syntax:
^// Code generated .* DO NOT EDIT.$
The .* means the tool can put whatever folderol it wants in there, but the comment must be a single line and must start with Code generated and end with DO NOT EDIT., with a period.

The text may appear anywhere in the file.

@robpike's original suggestion for placement was to match the rule for // +build:
[Narrator: Good idea; I am going to suggest matching the rule for //go:build below.]

The text must be a // comment, and that comment must appear before but not be attached to the package clause.”

After some bikeshedding about other comments it changed to:

The text must appear as the first line of a properly formatted Go // comment, and that comment must appear before but not be attached to the package clause and before any /* */ comment. This is similar to the rules for build tags.

@bradfitz then raised the question of non-Go source files saying:

I think it's a mistake for this proposal to require knowing anything about Go syntax.

It should be confined to a pattern being matched on the first N lines only, ignoring everything about packages or package comments, etc.

@myitcv raised a question about Go files with syntax errors and suggested anywhere in the file is fine, comparing to go generate.

@robpike then revised to the “text may appear anywhere in the file” rule, which is what was finally accepted.

I was writing a generator the other day and @dmitshur helpfully pointed out that I'd accidentally marked the generator itself as generated by writing this code in the generator:

const header = `
// Copyright 2019 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// Code generated by "makestatic"; DO NOT EDIT.

package static
`

If the rule we've defined flags this program as auto-generated, because it is itself a generator, that seems to me more like a bug in the rule than a bug in the generator.

I propose we change the rule to match the new //go:build rules (#41184).
Specifically, the comment must appear before the first non-blank line, non-comment text.
/* */ comments before the // Code generated ... comment would be allowed.

The rationale is:

  • There's now a sensible placement rule we can share (the //go:build one).
  • The //go:build rule applies equally well to Go and non-Go files, addressing @bradfitz's concern.
  • The //go:build rule avoids any problem with syntax errors, because the lines must be before any syntax, addressing @myitcv's concern.
  • The lines are for people to find, and they are much easier to find if they are at the top (perhaps after a copyright notice and other comments), as opposed to needing to skim through the entire file.
  • (It's of course fine for tools to help find them, but we work hard to keep Go a language that does not require tool support for a good development experience.)
  • The comparison with //go:generate does not really apply. The lax placement there is because we want to allow putting the //go:generate line next to what it applies to, such as putting a //go:generate stringer line above the type that is being stringered. In this case, the // Code generated ... applies to the entire file. It should be above the file.
  • Changing the rule avoids a false positive when a generator itself includes the magic comment in a multiline raw string literal.

It's reasonable to ask: Isn't it too late to change this? What about generators that put the comment later in the file? Their output won't be recognized as generated anymore.

I looked into that using the module mirror-based Go corpus I assembled back in March. Of all the Go source files I have in that corpus, I found:

  • 659,188 files that contained a // Code generated comment.
  • 653,750 of them appear before the package statement.
  • 4,034 of them appear after the package statement but before imports/decls.
  • 1,391 of them appear after the imports but before decls.
  • 13 of them appear after decls.

So making this change would require 0.825% of generated files to be updated. And until they are updated, no big deal - people will still see the comment, and only a few tools care. If we fix the top five generators causing these lines (xo, cdproto-gen, minimock, msgp, chromedp-gen), we'd be left with only 1,008 mismatched files, or 0.15%. In any event, when we first adopted the rule we had to update essentially 100% of generated files. Now we're talking about under 1/100 of that, so the impact here is not large.

On the other hand, consider generators. I had the same scan look for the magic text inside string literals. It found 2,272 instances of string literals containing the text but not at the start/end of a line; those are correctly skipped by the current rule. It also found 2,350 instances of multiline string literals like the one I'd written; all those generators are incorrectly flagged as themselves auto-generated by the current rule.

That is, just over half of the generators I found are doing it wrong by the current rule.
This strongly suggests the rule should be changed.

I've attached the non-top-of-file results as autogen.txt if anyone wants to look into the details here.

autogen.txt

I think requiring the "// Code generated ... DO NOT EDIT." comment to be near the top of the file rather than anywhere, if it can be done, is a change that would benefit both humans and machines reading .go files.

I agree with the bullet points under rationale.

It also seems to me that this change wouldn't be too disruptive or too late, and hopefully people can chime in if they believe otherwise. There are not that many generators/generated files that'll need updating, and many will be "fixed" by the change anyway. There are likely more projects that haven't been updated to use the existing https://golang.org/s/generatedcode convention yet (either because they didn't get around to it yet, or they don't have a need for it), so they can just target the latest version of the convention when they get around to it.


I have a minor clarification question to understand an edge case around the proposed change better:

I think we should change the rule to match the new //go:build rules (#41184), namely that the comment must appear before the first non-blank line, non-comment text. /* */ comments before the // Code generated ... comment would be allowed.

How are whitespace characters meant to be treated? Such characters would be an unlikely situation, but I'm hoping the proposed convention is clear about it. It can happen primarily in non-gofmted files (e.g., a stray trailing space in a generated file, or someone evaluating all edge cases of this proposal). For example:

/* foo */  /* bar */
    
// Code generated by gen. DO NOT EDIT.

package foo

I tried reading the placement section in the //go:build draft design, https://go.googlesource.com/proposal/+/master/design/draft-gobuild.md#design-placement, which says:

The new rule would be:

“Constraints may appear in any kind of source file (not just Go), but they must appear near the top of the file, preceded only by blank lines and other // and /* */ comments. These rules mean that in Go files a build constraint must appear before the package clause.”

It seems like whitespace should probably be allowed too, but that text doesn't include it explicitly. Is it included implicitly or not at all?

rsc commented

Spaces and tabs before the magic line don't matter.
(Go eschews investing invisible text with importance.)

I think many people (myself included) likely believed this to already be the case (which is why so many generators make that mistake).

Here's a question: under the current system, if your generator was based off of a text/template stored in a seperate file, what would be the best way to handle this? would it be ignored as it isn't a go file?

Here's a question: under the current system, if your generator was based off of a text/template stored in a seperate file, what would be the best way to handle this? would it be ignored as it isn't a go file?

I understand that the convention defined as a result of proposal #13560 was meant primarily for .go source files, so a text/template stored in a separate file wouldn't be affected as long as it uses a different file extension. But if someone were to try to use the convention on a text/template file, they may get a false positive.

This is a good point to consider and potentially clarify.

It's likely reasonable to use the same convention for non-.go source files that are supported by the Go build system, but from what I can see, that was not really discussed (other than in #13560 (comment)) or considered in scope of #13560. See https://golang.org/cmd/go/#hdr-File_types, which currently says:

Files of each of these types except .syso may contain build constraints, but the go command stops scanning for build constraints at the first item in the file that is not a blank line or //-style line comment.

I don't think it's viable for the Go project to define a convention to mark generated files of any arbitrary type. For example, some programming languages may not support line comments, so requiring a line that begins with "//" cannot work everywhere.

I understand that the convention defined as a result of proposal #13560 was meant primarily for .go source files, so a text/template stored in a separate file wouldn't be affected as long as it uses a different file extension. But if someone were to try to use the convention on a text/template file, they may get a false positive.

I think the most likely problem scenario would be someone using a text editor mode made for go files on a text/template file (that generated go code). The editor mode may assume that it would only be used on go source files, and thus complain about it. I don't think this (the scenario I laid out, not the issue as a whole) is a major issue though, just something to think about.

rsc commented

It's true that the primary focus for these comments is Go code and more generally Go package sources.
Go package sources today can be Go, assembly, C, C++, Objective C, SWIG, and Fortran.
The comment being discussed here, like the //go:build comment, handles all of these these except Fortran, which seems fine.

<Imagine snarky comment about "DO NOT EDIT" being implied by program being written in Fortran.>

Go package sources today can be Go, assembly, C, C++, Objective C, SWIG, and Fortran.
The comment being discussed here, like the //go:build comment, all of these these except Fortran, which seems fine.

Uh, I think you forgot a word here?

rsc commented

Based on the discussion above, this seems like a likely accept.

rsc commented

No change in consensus, so accepted

Should this be closed?

@cristaloleg No. The proposal has been accepted, but it has not been implemented. At least, not as far as I know. For example, https://golang.org/s/generatedcode needs to point to the updated rule.

rsc commented

I updated golang.org/s/generatedcode to point to the relevant cmd/go docs.
I will also send a CL updating the cmd/go docs.

Change https://golang.org/cl/283633 mentions this issue: cmd/go: fix go generate docs