golang/go

proposal: x/mod/zip: provide a way to ignore files when creating module zip

dolmen opened this issue · 5 comments

Proposal

As a Go module author I would like a way to tell golang.org/x/mod/zip to ignore some files from the VCS repository where my module is published.

I propose to add a file that lists patterns of files to ignore in a similar way to the well known .gitignore. That list would be used to filter the list of files to exclude/include when creating a module zip.

Rationale

A VCS repository that contains a Go module may also contain many files irrelevant to the use of that module.
Currently most of the content of the repository are embedded in Go module zip files.
And most of the big files stored in Go modules are ones irrelevants to the Go build as a end user (like fuzz data, videos, images, HTML documentation...).

The current list of restrictions for files in a module zip is quite short: https://go.dev/ref/mod#zip-path-size-constraints

You can check the list of files on your own machine with this command:

find $(go env GOMODCACHE)/*.* -type f ! -name '*.go' ! -name 'go.mod' ! -name 'go.sum' ! -name 'list.lock' ! -name 'v*.mod' ! -name 'v*.info' ! -name 'v*.zip' ! -name 'v*.ziphash' ! -name 'v*.lock' ! -name 'LICENSE*' ! -name 'README*' -print

Check the size (requires GNU find):

find $(go env GOMODCACHE)/*.* -type f ! -name '*.go' ! -name 'go.mod' ! -name 'go.sum' ! -name 'list.lock' ! -name 'v*.mod' ! -name 'v*.info' ! -name 'v*.zip' ! -name 'v*.ziphash' ! -name 'v*.lock' ! -name 'LICENSE*' ! -name 'README*'  -printf "%s\n" | awk '{sum+=$1} END{print sum+0}'

On my machine:

  • 13GB used by module content in module cache (du -hc $(go env GOMODCACHE)/*.*) | tail -n 1)
  • 208,430 mostly useless files
  • 3,834,497,841 bytes

This is a waste of resources (network, storage on proxies, storage on build machines which are often end-user machine).

Implementation ideas

These are just general ideas that would have to be expanded/specified in a design document.

The patterns file would be stored at the root of the module (with go.mod). If we follow the .gitignore model, ignore files would also be allowed in sub directories.

Ideas for the naming the file:

  • .goignore (like .gitignore)
  • go.ignore (to go with go.mod, go.sum)
    Parsing of the file would be handled by new APIs exposed in package golang.org/x/mod/zip
zx2c4 commented

Alternatively, could x/mod/zip figure out the minimal set of files required, by working out the dependency graph from externally reachable entry points, and then adding a few pre-set project files like LICENSE and README?

Do you want this to create your own zips which your would serve/host or do you want to change the default set of files selected by go ?
If it's the first, then it should only be via API.
If it's the second, it's a dup of #30058 or #42965

@zx2c4: this is not about filtering Go files. This is about filtering non-Go files.

@seankhliao Yes, we can consider this is a duplicate of #42965 or #30058: we need a common solution to those problem.

I hope that my new approach to the problem statement will raise the priority of that issue because as I show that the problem affects every developer's machine and grows as the community grows and time passes (each developer gets more and more modules and more and more versions of each module).

In that case I think the conversation can be kept in #42965