bazelbuild/rules_go

Expose generated Go files to editors and IDEs

jayconrod opened this issue Β· 133 comments

When a library is created with go_proto_library or some other mechanism that generates Go code (.pb.go files in this case), the generated sources files are not exposed to editors and IDEs. Consequently, code completion and refactoring tools don't work, lowering developer productivity.

The Go team is developing a workspace abstraction mechanism that will allow editors and Bazel (and other build systems) to communicate with each other without the need for direct integration. This is the long-term solution to this problem.

This issue will track progress on the workspace abstraction and will track editor awareness of code generated with Bazel.

This issue is in response to Bazel build, protobuf and code completion.

TODO: update answer when this issue is resolved.

What the status of the task?

So, I have proposal how to close this issue.

Suppose you generated code:
bazel-genfiles/somedir/somefile.go
bazel-genfiles/somedir/anotherfile.go

Let's add symlink
src/somedir => ../bazel-genfiles/somedir

Use in your files
import "somedir"

As result you will get workable code-completition AND workable build.

@excavador We're still developing a more complete solution to this (the workspace abstraction I mentioned above). Ideally, tools won't need to be aware of the project layout in use. I don't have any ETA for that yet, but we are making progress.

As you showed, it's possible to work around this with symlinks. You can also copy or symlink generated files into your source directory. Bazel will prefer a generated file if a source file of the same name is present. I'd recommend against checking generated files into VCS though.

Thank you for notify! Will wait. You can consider me as closed beta-tester :)

Any updates?

@raliste it works for me in IDEA Ultimate + Bazel plugin with following limitation: autocomplete works only if all files inside golang package are auto-generated. If mix together auto-generated and manually written file, autogenerated file would not be inspected

The workspace abstraction mentioned above is being actively worked on. We've gone through a lot of internal prototypes, and we're in the process of reviewing, testing, and merging one that we're pretty happy with. It's a huge body of code though, and it will take time. Following that, a large number of Go tools will need to be updated to use the workspace abstraction so they can function in "go build", vgo, Bazel, and other kinds of workspaces.

I know it seems like progress is slow, but it's an interface that many, many tools will interact with, and we want to be sure we get it right. It's a huge amount of work, but it is our highest priority task.

@jayconrod Are there any public discussions about that? Design docs perhaps?

Not yet, but soon. We're still working on simplifying and narrowing the interface.

Can you briefly explain how this workspaces feature is going to work?
Are there any upcoming changes in the 1.11 related to this?

The first public piece of this is golang.org/x/tools/go/packages. At the moment, that is based on go list and is intended to be used by tools that support go build and vgo. We're also working on an workspace abstraction layer that go/packages may eventually be built on. That workspace abstraction would be extensible and could include Bazel. Lot of work still happening there.

@jayconrod Any updates on the proposed workspace abstraction layer?

@akshayjshah We've made some progress. The golang.org/x/tools/go/packages API is nearing completion. Our team has migrated a few tools to use it already, but the effort is ongoing.

If you set the GOPACKAGESDRIVER environment variable, or if you have a binary named gopackagesdriver in PATH, go/packages will shell out to that binary for queries. I have an implementation of that binary about 70% done. It works in most situations, but it needs polish, documentation, and tests.

So ideally, once that change is in, you'd be able to build and install the driver into your PATH, and tools that use go/packages will just work in Bazel workspaces (and they'll fall back to the default behavior if there's no WORKSPACE file).

Thanks for the update! Would you be willing to push your driver to a branch so I can take a look? I'm not in a huge rush for you to officially release it, but I'd love to use your rough draft to better understand how these pieces should fit together.

@akshayjshah Sure, it's at https://github.com/jayconrod/rules_go/tree/dev-gopackagesdriver if you'd like to take a look. As I said, it's very rough right now.

That looks awesome, thanks! Is there any update on this?

@pierreis Sorry, no updates since my last comment. I'm still planning to work on this this quarter. There are a lot of things going on, but this is one of the higher priority projects.

Generated .pb.go files are now available in the go_generated_srcs output group since #1827.

Leaving this issue open though since it's not documented yet.

@jayconrod is package driver still planned as a feature or are we going to rely on the aspect-based approach mentioned above?

A driver is still planned, but I don't have any update to share yet. Implementation-wise, the data will be produced by the builder and accessed through an output group (but the driver will hide those details).

I assume the driver is going to be something integrated into the IDE?

@jayconrod rebased your branch and updated it with the latest changes from the go/packages API to play around a bit with the driver concept. I managed to get it to a point where I can use the bazel driver to power some of the module enabled tools like godef-mod or gopls. You can find it at https://github.com/robbertvanginkel/rules_go/tree/dev-gopackagesdriver.

Some of the code is a bit gnarly, needs some tests and there's still a few TODO's in there, but I'd love to help out and get the driver added to rules_go. Since the branch is mostly based on your changes, do you have a suggestion on how we could collaborate on this?

@robbertvanginkel I'm probably not going to keep most of the code on that branch. I don't think gathering information through separate actions via an aspect will be fast enough, and it adds a lot of complication. Instead, I'm thinking of having the compile action emit JSON metadata, then the driver can just "build" an output group and collect the results.

But a fair bit of refactoring is needed on the builder side of things first. I have a half-written PR to pull everything into a single binary and have the toolchain own that.

In short, I'm not ready to collaborate on this yet.

Ok, thanks for the update! Please me know if you reach a point where someone can help out.

@jayconrod I'd love to help out too.

asv commented

@jayconrod I’m newbie in Skylark (and bazel concepts) yet. Can you tell how to copy (or link) files described in OutputGroup go_generated_src to some folder inside the workspace? In other words, can I now generate .pb.go a la protoc + protoc-gen-go using rules_go?

@asv This is something I wish were simpler in Bazel. You can build with the command line argument --output_groups=go_generated_src (or any comma separated list of group names). To get the paths to the files in a machine-readable format, you can use --build_event_json_file or --build_event_proto_file, depending on whether you want to parse JSON or proto. build_event_stream.proto describes the format. You'll be looking for an event with a TargetComplete payload.

asv commented

@jayconrod I'm got it!

Small fix to your post: --output_groups=go_generated_srcs. Name of group go_generated_srcs, not go_generated_src. It's a pity that bazel does not complain on error in name.

What's the state of this ticket? It looks like the necessary functions were added to Bazel. It also looks like there's a go package to support it. What's missing?

@clanstyles Nothing external is blocking this work anymore, but I'm working on landing some architectural changes first that will simplify this and fix a number of other issues.

In particular, I'm working on moving nearly all of the logic to build individual packages into the execution phase in a single action. This will dramatically reduce the loading / analysis burden for cgo, and it should enable better cross-compilation in the future. It should also lower remote execution overheads. One of the outputs of this action will be json files that can be picked up by a gopackagesdriver binary and presented through the golang.org/x/tools/go/packages API. The work so far is on my fork in the dev-compilepkg branch if you're curious.

As you can imagine, rewriting anything related to cgo is complicated, and it takes time to get it right. These days, I'm primarily working on module support in the Go command and tools to validate module releases. rules_go and Gazelle are still active of course, but most of the time I have allocated each week goes to answering questions, fixing critical bugs, and reviewing PRs.

Is there any workaround rule or example to copy or link the generated .pb.go to the sourse directory by bazel run?

@TennyZhuang I use a basic bash script to symlink the generated proto files: https://gist.github.com/kalbasit/3c9607333f5f3c794d2bd96114184301

@kalbasit when are you triggering this?

@clanstyles manually. I know that it sounds like a lot to remember, but it does become a habit. If I'm changing a protobuf, I ran kt proto. If I'm changing an asset, I ran kt gen. For context, kt stands for KeepTruckin and is a rich CLI to interact with our mono-repo, driven by Nix/Bazel.

You can probably get something integrated with your editor triggered upon saving. I do not recommend doing that if it's going to add latency to your overall editing experience.

@jayconrod - It seems like you had made a bunch of progress on this issue, but you've walked it back to do the architectural changes first. Would there be value in making the feature available (at least to collaborators) so more people can get value sooner? Then the architectural changes can come in under the covers and everyone who is using the interim system will benefit.

@strican Architectural changes landed in #2027. I'll implement this as soon as I have bandwidth.

Jay, are you open to a PR for this?

@akshayjshah No, probably not for this. I don't think there's going to be a lot of code, but it needs to be carefully written across several layers.

@jayconrod any update on this? what is the timeline we should track for this change?

@ChinmayR I've done some prototyping on https://github.com/jayconrod/rules_go/tree/dev-gopackagesdriver a few weeks ago. A few specific cases are close to working, but as a whole, it's not all that close.

My time is split between several projects. Of the time I'm able to spend on rules_go, most of it goes to maintaining compatibility with new versions of Bazel. #2079 and #2089 for example are taking all my time this week and last week. I'm also working on closing release blockers like #2058 before the next major release.

Sorry, I know this isn't moving as fast as everyone would like, but stability comes first.

@jayconrod I know you're busy, but what items are remaining on this? Would it be possible for the community to contribute?

Does anyone have gopls working (reliably) with vim+bazel?

As Dan said above, is there a specific task list that we could help with? If it's a lot of explorational work I could understand that you have to work on it alone, but if there would be a design doc or similar.. :)

To recap and add to what's already been said: this is the most important feature request currently pending in rules_go. However, I prioritize maintenance work (especially Go and Bazel compatibility) and bug fixing over features, and lately, there's been a lot of the former. Bazel has been shipping a lot of breaking changes before their 1.0 release planned for this fall, and for the last month or two, it's been taking 40-50% of my work week to keep up with that, plus operational stuff like tagging releases, triaging issues, and answering questions. My time budget for rules_go and Gazelle is ~20%, so that has meant mornings and weekends.

While this feature is high priority, it's also very large and complicated. I'd guess it's on the order of 15-30 eng-days for the main implementation, plus a long tail of bug fixing and support. The main chunk of work is a binary that implements the driver interface of golang.org/x/tools/go/packages. The driver is invoked by packages, and it needs to run bazel build with some output group or aspect. From there, it will extract the output and print JSON objects for each package. The objects may contain anything from source file names to fully parsed ASTs with type information, depending on what's requested. There are a number of complications for the driver:

  • It needs to work with regular Go packages and standard library packages, which are built differently or not at all.
  • That means deep integration into a couple different actions within the builder. We will need to define some new actions. Some of the work I did on this earlier was extracting information for the standard library. I wrote a new action just to produce a zip file of JSON that the driver can extract and return.
  • The driver needs to give packages whatever metadata was requested without incurring any unnecessary cost. If packages only asks for file names in each package, we shouldn't compile anything. If it needs paths to compiled .a files, we do need to compile.
  • packages may ask for dependency metadata, so there may be an aspect involved. And this needs to not break the aspect we currently use for cross-compilation and also not make that aspect more difficult to get rid of in the future.
  • The driver needs to detect whether it's actually being used in a Bazel workspace and fall back to go list if not.
  • packages has some overlay functionality for new files that are being edited but aren't saved yet. I have no idea how that will be implemented, since we won't have build files for these yet.
  • All of this needs to be very fast, since the user is waiting. gopls has some caching, but I'm sure we'll need to wait for the driver in some places.
  • It's not clear to me if the driver should operate in a separate output base to avoid locking the main workspace. I think the Blaze IntelliJ extension does this, but it's expensive in terms of disk space and RAM.

Anyone implementing this needs to be intimately familiar with Bazel rules, aspects, golang.org/x/tools/go/packages, and all layers of rules_go stack. It wouldn't hurt to know about how go list and gopls both work. A lot of design work needs to go into this. I have a rough idea of what needs to be done, but I don't have a blueprint.

If someone is willing to learn all this and implement it, I'd be willing to accept PRs only if you're willing to take at least partial ownership of rules_go. There will definitely be a lot of bugs in anything this big and complicated. Some of those may actually be gopls or packages bugs that start out here, but the whole stack needs to be debugged. Many of the reported issues will be pretty vague. I'll only invest time in mentoring and reviewing someone if they're willing to stay and help with support, bug fixing, and code health long-term.

Thanks for the detailed reply, Jay!

I'll only invest time in mentoring and reviewing someone if they're willing to stay and help with support, bug fixing, and code health long-term.

This is a really, really generous offer. I'll poke through the network of people I know are interested in this issue and see if I can find someone who's funded to help develop rules_go as part of their day job.

I've written a design doc on the wiki at Editor and tool integration. I think I have a better understanding of how to deal with test packages and overlays, which were the biggest area of uncertainty for me before.

Folks who are interested in working on this, please take a look and let me know (here or on Gophers Slack) if you have thoughts or questions.

I have some code that parses the configuration, runs bazel, parses the Bazel proto output, and formats package output. I'll clean that up and put it on a branch next week. There's still a lot of work to do after that though.

@jayconrod Is https://github.com/jayconrod/rules_go/tree/dev-gopackagesdriver the right branch to be looking at for your latest work? I was curious about helping out.

@jmhodges feature/gopackagesdriver has a skeleton implementation. That has as much code from the dev-gopackagesdriver branch as I thought was usable.

@vitarb was working on this a while ago. Not sure if he's made progress.

Plunked down a commit that adds an initial failing go_bazel_test on my own feature/gopackagesdriver branch.

I'm going to pick this back up tomorrow, and, since I've paused, I wanted to see if this was the kind of thing you meant for the testing strategy in the design doc.

So, I've gotten a good bit further.

@jayconrod The design doc talks about distinguishing between test targets and non-test targets but I'm not sure how we'd do this from inside the aspect in order to get the IDs set correctly. We, of course, don't have a GoTest provider currently. Have I missed an API we can call inside skylark to grab the target's kind or should we create a GoTest provider for this? Or some other better option I didn't see?

Relatedly, as I worked on this, I realized we probably don't want the non-recursive aspects (e.g. gopackagesdriver_files) to produce output on go_binary targets with embed set because we'd duplicate the work done on the original go_library that was embedded and possibly confuse downstream tooling.

Do you have any thoughts on how to distinguish go_binary from go_test in the aspect without also accidentally no longer working in the go_test case itself?

There's a GoBinary provider we can check the target for, but no equivalent GoTest provider. Perhaps I'm missing something obvious!

@jmhodges I haven't thought much about how tests and binaries will work with this specifically. It's possible additional information will need to be added to the current set of providers.

Are you coordinating with @vitarb on this? I pinged him on Slack yesterday, but I haven't heard anything in a while.

Again though, I'd like to repeat my earlier comment and caution that I'll be very hesitant to accept any PRs for this. Getting it working is one thing, but supporting it is going to take a much larger amount of time that I can't commit to on my own. Making gopls and go/packages work with go list is already a lot of work.

@jayconrod Cool! I'm happy to hear I'm not completely in the wrong direction.

I totally understand your desire to not to put it in the mainline rules_go repo!

I'm here plopping questions and updates here even though the tool won't necessarily live here because the design doc has been so great and useful, and I figured we'd want some record of things folks run into while implementing it. And I don't think we'll know what needs to be exposed by rules_go for a third party tool to exist until we do some of the work to build it. Plus, it makes it a lot easier to ask for and get smaller rules_go API changes later if we're upfront and intentionally communicating about what we've been trying.

Finally, I'm not sure which Slack you're talking about. (The #bazel channel on gophers.slack.com doesn't seem to be active enough to be the one you mean? But maybe I'm wrong.) I figured this ticket was a good place to put work since he and I are strangers and there don't seem to be other ways for me to make private conversations with him.

Just want to reiterate how incredibly dope that design doc has been: It's been incredibly dope. Thank you for writing it.

Also, maybe you meant the bazelbuild slack? I've plopped myself into it in case you did!

Glad the design doc has been useful. I expect this is the kind of project where there will be a lot of unanticipated issues that a doc won't be able to address ahead of time. It seems like you're running into a few of those already with tests, so I do appreciate the updates here.

About Slack, I did mean the Gophers Slack. I should keep a closer eye on the bazelbuild Slack -- seems the #go channel there has picked up more than the #bazel channel on Gophers Slack.

Dope, I've got the first simple single package pattern packages.Load test passing in my branch. Lots of FIXMEs while I'm learning things.

Debating whether to next build handling for the single pattern file= query type or handling a multiple pattern query (e.g. packages.Load(nil, "//:hello", "//:goodbye") instead of packages.Load(nil, "//:hello")). Might be the former so I can see this playing nice with gopls query definition.

Also, hope you all had a happy climate strike!

Hm, getting stuck on supporting packages.NeedCompiledGoFiles. The CompiledGoFiles field is expected to include the intermediate cgo-generated Go files that include things like the cgo types, etc.

(The CompiledGoFiles field is meant to be the same as the -compiled flag in go list -compiled when run against a cgo project. This is one of the packages.LoadModes that gopls turns on when doing its thing.)

I wasn't able to find those cgo-generated Go files in GoArchiveData's srcs fields, its orig_srcs field GoArchive nor `GoSource. I suspect I missed something obvious.

Where could I find those files in the API?

Another question!

It looks like you need a not-empty nogo set up[1] in order to get the type check export data file outputted and GoArchive's export_file to be set to it. There's some if go.nogo: checks in various places that only export the file and set the fields if it's true.

(That's export file that's like Export in go list, is asked for by gopls and, is asked for with the packages.NeedExportsFile mode)

I'm not sure how to go about either a) requiring nogo being set well (fail is obvious, of course, but it's not clear which parts of nogo count as the minimum needed and that feels fairly arbitrary) or b) getting this export data added without needing nogo.

Do you have some ideas for that?

[1] That is, not like the "default_nogo" in rules_go but more like the "tools_nogo" one.

Debating whether to next build handling for the single pattern file= query type or handling a multiple pattern query (e.g. packages.Load(nil, "//:hello", "//:goodbye") instead of packages.Load(nil, "//:hello")). Might be the former so I can see this playing nice with gopls query definition.

I think multiple packages can be handled by making the driver build multiple targets in a single bazel command. A single target could turn into many Go packages (with -deps), so we need to handle multiple packages anyway.

file= I think is pretty tricky. I think we need to load a set of targets that could contain that file, then check if they actually do. Approximating this is probably good enough: we can make some assumptions based on importpath and directory structure.

Hm, getting stuck on supporting packages.NeedCompiledGoFiles. The CompiledGoFiles field is expected to include the intermediate cgo-generated Go files that include things like the cgo types, etc.

They're not exposed in the API anymore. We used to have separate actions for coverage and cgo code generation, but this was complicated and expensive. Now everything is in GoCompilePkg.

The aspect may need to expose these files somewhere, and GoCompilePkg will need to be told where to write them. The interface between the aspect, gopackagesdriver, and GoCompilePkg is private, so no need to worry about compatibility there.

It looks like you need a not-empty nogo set up[1] in order to get the type check export data file outputted and GoArchive's export_file to be set to it. There's some if go.nogo: checks in various places that only export the file and set the fields if it's true.

(That's export file that's like Export in go list, is asked for by gopls and, is asked for with the packages.NeedExportsFile mode)

You should point to the compiled .a file produced by GoCompilePkg here, not nogo's .x file.

Currently, the compiler produces machine code and export data as separate sections within the .a file, and nogo produces facts in the .x file. We should move the export data into the .x file (#1803, #2199). The Go linker does not support this in some build modes though, so we're currently thinking about copying the export data. Anyway, it will be in the .a file for the foreseeable future.

Also, hope you all had a happy climate strike!

I did! I went to the one in NYC. Far better turnout than I expected. I heard an estimate of ~60,000 people.

I feel lucky to get such nice replies to my questions.

I think multiple packages can be handled by making the driver build multiple targets in a single bazel command.

Great! This is the tactic I was hoping to take. It'll be relatively understandable to extend the current packagesFromBazelTargets with its visit func etc to do that.

file= I think is pretty tricky. I think we need to load a set of targets that could contain that file, then check if they actually do. Approximating this is probably good enough: we can make some assumptions based on importpath and directory structure.

I've actually got a version of file= working okay. I took a similar strategy applied to bazel labels. That is, the current gopackagesdriver turns the given file path into its equivalent bazel label, and then replaces the trailing : part with a :* to match all targets in the target's "directory" ("package"? whatever we call it) and then used that splatted label in bazel query with some attrs work on the packages inside it. That code currently lives in fileLabelToBazelTargets.

That seems like it would hit more than 80% of use cases. It will, however, need some work to handle stdlib packages.

The interface between the aspect, gopackagesdriver, and GoCompilePkg is private, so no need to worry about compatibility there.

Ahh, cool. So GoCompilePkg could maybe be reified into a (private?) provider to get at the cgo stuff? Or maybe something else entirely? (I'm assuming the private provider is the answer, but I've been wrong before)

You should point to the compiled .a file produced by GoCompilePkg here, not nogo's .x file.

Thank you! This unblocked me nicely. Now I get to solve the stdlib package problem.

On the topic of stdlib packages:gopls asks for info about the builtin stdlib pkg when it starts up, and I've hacked in a fake look up for that (builtin is very special: no ExportFile or GoCompiledFiles ever, for instance). But trying to get complete coverage for stdlib look ups, like a lot of stuff interacting with stdlib packages, def needs some work.

I've kind of assumed I'm just going to have create the stdlib packages artificially with some packages_list, some staring at go list -json -export -compiled -test std for other special cases, and some GoContext.sdk and GoStdLib manipulation.

But I'm very open to alternative ideas!

(I'd also been wondering what to do in terms of future compatibility with #1803, so if there's stuff I should be thinking about specifically, let me know.)

That seems like it would hit more than 80% of use cases. It will, however, need some work to handle stdlib packages.

Packages in std are problematic because we can't name them using Bazel labels, and the dependencies are implicit.

In an earlier prototype, I was working on something that would build a .zip file with .json package data for each package in std or cmd. Any action that might produce .json package data for a regular package could depend on that .zip and pull out whatever files it needs. Those .json files could by produced by running packages.Load with GOPACKAGESDRIVER=off, so it does the regular loading with go list. All the type information would be there, so if a user asks for less later on (e.g., AST but not types), it can just be filtered out.

As an aside, I'd like to evaluate handling compilation like that too: there would be a .zip of .a files, and each GoCompilePkg action could depend on that and pull out what it needs instead of depending on every individual .a file. That has an enormous amount of overhead for remote execution.

I did try to figure out a way to repurpose go/packages’s internal go list driver that did that work. But I thought the .a’s were in unexpected places on disk that might make that infeasible (Without resorting to using whatever Go they had installed outside of their repo). But it seems I was wrong about those filepaths! (Maybe they changed?)

So if we set GOPATH to our GOROOT, that go list code seems like it would work for stdlib stuff. We’d need some rewriting of the returned IDs and PkgPaths of those stdlib packages to fake bazel labels we can reinterpret in our gopackagesdriver. But this seems plausible

I keep meaning to write this down, but we'll want a cross-compile story so that folks can use this tool to work with APIs not in their host machine's go SDK. Folks setting up gopls in their editor and will likely be more familiar with using the GOOS and GOARCH env vars to pick the right platform than using bazel's --platforms. And I'm not immediately seeing an easy place to get gopls to a --platforms argument value down to $GOPACKAGESDRIVER.

So, it might be best to just do the work of getting gopackagesdriver to understand GOOS and GOARCH and map to a suitable-enough platform. But this is an open question and more exploration would be nice. (This is something that's sort of blocking filling in the Sizes field on the returned pkg objects.)

I did try to figure out a way to repurpose go/packages’s internal go list driver that did that work. But I thought the .a’s were in unexpected places on disk that might make that infeasible (Without resorting to using whatever Go they had installed outside of their repo). But it seems I was wrong about those filepaths! (Maybe they changed?)

So if we set GOPATH to our GOROOT, that go list code seems like it would work for stdlib stuff.

That should work. We should only need to set GOROOT, not necessarily GOPATH. Even when we're cross-compiling the standard library, we use go install std, so the package directory layout should be what go list expects (though we'll need to be careful with -installsuffix).

We’d need some rewriting of the returned IDs and PkgPaths of those stdlib packages to fake bazel labels we can reinterpret in our gopackagesdriver. But this seems plausible

Earlier, I was thinking something like @io_bazel_rules_go//:stdlib%fmt (where % is some separator character). Haven't given it much thought though, not sure if that would cause any problems.

I keep meaning to write this down, but we'll want a cross-compile story so that folks can use this tool to work with APIs not in their host machine's go SDK. Folks setting up gopls in their editor and will likely be more familiar with using the GOOS and GOARCH env vars to pick the right platform than using bazel's --platforms. And I'm not immediately seeing an easy place to get gopls to a --platforms argument value down to $GOPACKAGESDRIVER.

So, it might be best to just do the work of getting gopackagesdriver to understand GOOS and GOARCH and map to a suitable-enough platform. But this is an open question and more exploration would be nice. (This is something that's sort of blocking filling in the Sizes field on the returned pkg objects.)

Build flags like --platforms should be passed through packages.Config.BuildFlags. gopls is responsible for setting that. It's expected that those flags are specific to the underlying build system (not just go build). We'll have more flags in the future for things like static, pure, debug, too.

The patterns are build-system-specific, too (Bazel labels in our case). I am worried about tools assuming the underlying build system is always the go command. We have a bit of a monoculture right now.

Hey, so here's a thing. I've been relying on the deps in our targets to generate the package.Packages we need in Package.Imports. But we don't actually keep track of what stdlib libraries are used by a go_library target. We just pass all of the stdlibs in as the archives in emit_compilepkg and go from there.

My question is: did I miss somewhere else where we might pare down that list of stdlibs into something smaller and, if so, should I repurpose it to extract which stdlibs a bazel target is importing?

Or should I be focusing on building an AST walking tool to extract which stdlibs a bazel target is importing?

I'm suspecting it's the latter!

It's definitely the latter, unfortunately. We need to read the sources to find the actual imports 1) to find stdlib dependencies, and 2) to filter out files that were excluded by build constraints.

Most of the code to do this is already in //go/tools/builder. We need to apply build constraints and read imports in order to generate an importcfg file to pass the compiler. We also need to load AST and type information for nogo.

So, for folks wondering what the status is so far:

I've gotten pretty good distance! Basic look ups for non-stdlib libraries works fine.

Because of how bazel can't be run inside a command that's being bazel run'ed (but can if its built and then run outside of bazel) and a desire to make the gopackagesdriver binary go getable, I've built two different binaries.

One binary, in go/tools/gopackagesdriver, is to be go get-able and sets up an environment making it possible to run go/tools/gopackagesdriver-nonshim correctly (gopackagesdriver being the "shim" referenced here). That gopackagesdriver-nonshim is where the actual running of bazel targets with the aspects enabled happens.

I've left a long comment explaining some of this decision-making in the gopackagesdriver's main.go. It goes into details about keeping stdlib look ups consistent with the Go version the user is building with, etc.

I've also gotten stuck at what I think is a bug in go/packages. I made golang/go#34851 and hopefully I'll hear back soon. The summary is that while go/packages.Load does distinguish between IDs (our bazel labels) and PkgPaths (the Go import paths), I think there's a problem in its handling of Imports that confuses the two. (Or it could be I missed something obvious)

There's also been a lot of back and forth over where code should live and how to handle looking up stdlibs. It's required some code generation using rules_go's stdlib packages_list.txt because some kinds of patterns that Load will pass to -nonshim need to be translated into bazel labels before the aspects can receive them.

It also might be that we need a third binary that's called from the aspects to do some of the work done by -nonshim because it could make the cross-compilation use case easier to implement and make more of the work -nonshim is currently doing cachable by bazel itself.

@jmhodges Just a small remark, when a binary is executed under bazel run it can call bazel again (that limitation was removed at some point quite some time ago), I am doing that qutie a fair amount. But if the binary is executed under bazel build it can not execute bazel itself. But that aside it sounds like very promising progress and very excited to see this coming together :)

Good news! I've got LoadModes working, including NeedDeps, for most go_librarys and most go_binarys. It can used for non-test code, slowly.

The next steps is to add in go_test support, clean up the FIXMES, speed up the lookups, get access to those intermediate cgo-generated Go files, and handle the go_binary embed case.

There's a bunch of debugging statements, needed refactoring, and yadda yadda but, hey, the tests pass and I can use it in my editor as $GOPACKAGESDRIVER. I've pushed it up to jmhodges/feature/gopackagesdriver in my fork.

The lookup speed is too slow for common use, but there's been basically zero work done to make that fast. I imagine that some of the slowness is because of the uncached (by bazel) ParseFile results we have to do for stdlib import discovery. But, until I get some metrics in place, I don't want to commit to a strategy.

@jmhodges This is awesome! I'm excited you're making so much progress on this.

Sorry I haven't been much help. I'm hoping to have some spare cycles in the next month or two after the Go 1.14 freeze starts. I have some ideas about caching or pregenerating the results for std packages.

Ooo I'm curious about those ideas!

So, I'm looking at the code between compilepkg.go, the cgo generation code I'm extracting from compilepkg.go, and gopackagesdriver-nonshim, and there seems a lot of bits of code that care about the import paths and package names used in a Go package's source files.

I'm thinking we maybe want a new step in rules_go where we pull the import paths and package names set in the Go source files out in a reusable way.

The specific uses I've seen (so far):

  • Parsing for import paths used in the given Go source files:

    • The cgo code uses it (via filter.go's filterAndSplitFiles) to find the files that import "C".
    • compilepkg.go uses it to check that all of the import paths used are satisfied by the bazel target's dependencies.
    • gopackagesdriver uses them to find imports of stdlibs to add to go/packages.Package's Imports, and to gather all of the imports that stdlib packages use (which are also all just stdlibs, by definition).
  • Parses for package names set in the given Go source files:

    • The cgo code needs the package name for the -dynpackage flag passed to go tool cgo.
    • compilepkg.go uses it in the testfilter = "only" code to find the correct test files to include (and fixing the TODO to push that to the go_test_macro code would require this package name, too).
    • gopackagesdriver needs it to set the Package.Name field correctly in the embed cases and when an import path and the package name listed in the Go disagree (e.g. gopkg.in/yaml.v2, and so on).

My first thought was that this could be a step in emit_archive[1]. I'm not sure what format would be best to carry that information to compilepkg and am looking for advice on that. A JSON dump is not something we can use from within skylark but maybe that's okay. @jayconrod, got a preference?

[1] Which allows us to push the testfilter work out of compilepkg.go, but doesn't push it all the way to the test macro. If we want that testfilter stuff to go out to that macro, we've also go to pick a data format that isn't JSON or add another Go binary to call in that path.

Wait, is this what importcfg is meant to be for? I've still working out what the differences are between the ForCompile and ForLink cases.

Ah, nah, the package name doesn't make it in there

The GoCompilePkg action should be as consolidated as possible. Bazel actions have a lot of overhead in some situations (particularly with remote execution and sandboxing on operating systems with slow file I/O). It's fine for the code to be split up and reused, it's just that we need to be able to build each package with a single action (or as close to that as possible). Indeed, the code that generates JSON blobs that the driver will read should be integrated into the builder.

I thought about having GoCompilePkg emit a JSON blob that the driver could use by default. That would work for configurations that don't need AST or type information. Not sure how common that is and whether it's worth optimizing for. Probably better to keep GoCompilePkg fast.

Wait, is this what importcfg is meant to be for? I've still working out what the differences are between the ForCompile and ForLink cases.

importcfg is a file the compiler and linker can read. It maps import paths to actual package paths. Package paths must be unique in a linked binary and may include stuff like vendor directories. importcfg also gives the location of each .a file the compiler and linker will care about.

Okay, your response is making me rethink some things. Stepping back, how would you recommend gopackagesdriver find the paths to the cgo-generated Go files?

Okay, your response is making me rethink some things. Stepping back, how would you recommend gopackagesdriver find the paths to the cgo-generated Go files?

It has to be done in execution by the builder, at least partially integrated with GoCompilePkg. Currently, those files just exist temporarily as part of the GoCompilePkg action. They're deleted after the compiler runs. Nothing in the analysis phase knows about them.

I don't know whether we need to preserve those files for clients of golang.org/x/tools/go/packages. Possibly not? For example, a tool like godef would show the user the original cgo file, not the generated file, but it might need to read a //line comment in the generated file or something.

The generated Go files are used for providing type and syntax information and are fed into auto-complete, refactoring, and other tools. go/packages uses the files in CompiledGoFiles (which include cgo-generated Go files) instead of GoFiles and similar fields on the driver-returned packages.Packages for that purpose. If they didn't they'd have to reimplement what is probably a large amount of cgo.

Pushed a little further on this the cgo support:

Adding the cgo-generated Go file to the GoSource.srcs in _library_to_source isn't workable because srcs would be both the input to the compilepkg builder and contain an output -- the cgo-generated file -- of it which creates a cyclic dependency error

Some assumptions I've made:

  • srcs is the place to put the cgo-generated file because it's documented as the place to put generated cgo files (that documentation lives on GoSource.orig_srcs).
  • we want srcs to not change once it's left _library_to_source. I've not seen a place where to we change srcs after _library_to_source is called, but it's possible I missed some!

Is my immutability assumption correct? I've been operating off it because it seems that the go proto work generates srcs before plugging them into go.archive and it was the only place I could find doing extra srcs work.

Seems like you're on the right track. GoSource should be immutable once it's created by _library_to_source, though if there's a resolve function, that can modify the fields before GoSource is created.

orig_srcs and srcs are usually (always?) the same. Before actions were consolidated into GoCompilePkg, there were separate actions that did cgo code generation and coverage. Both of those caused srcs to differ and orig_src_map to map the original files to the generated files.

These fields have been left in place for compatibility, but perhaps it's time to make use of them again. srcs and orig_srcs are roughly analogous to GoFiles and CompiledGoFiles.

One problem is that I'm not sure they belong in GoSource, which is basically a description of a go_library rule suitable for embedding. GoArchiveData also has srcs and orig_srcs fields. That's really where they belong. emit_archive should be the function that declares these files, not _library_to_source.

I've broken out the cgo work as #2271 and I'm ready for it to be reviewed.

It does the thing we talked about: adds to GoArchiveData.srcs only.

With #2271, we've got cgo support working in jmhodges/feature/gopackagesdriver.

Any update on this?

@raliste it works for me in IDEA Ultimate + Bazel plugin with following limitation: autocomplete works only if all files inside golang package are auto-generated. If mix together auto-generated and manually written file, autogenerated file would not be inspected

I am having a similar issue with manually written files and it is frustrating, did you find any workarounds?
Edited: Manually generated -> manually written*

Just in case: I am wrote to this ticket, but after that I found IDEA plugin which solved my problems completely: https://plugins.jetbrains.com/plugin/8609-bazel

The GoCompilePkg action should be as consolidated as possible. Bazel actions have a lot of overhead in some situations (particularly with remote execution and sandboxing on operating systems with slow file I/O). It's fine for the code to be split up and reused, it's just that we need to be able to build each package with a single action (or as close to that as possible). Indeed, the code that generates JSON blobs that the driver will read should be integrated into the builder.

I thought about having GoCompilePkg emit a JSON blob that the driver could use by default. That would work for configurations that don't need AST or type information. Not sure how common that is and whether it's worth optimizing for. Probably better to keep GoCompilePkg fast.

I've been digging in to this a little lately. The aspect approach does seem to be too slow, and it's causing bazel to run out of memory when I try to use it an a large go project. Adding the json generation to GoCompilePkg seems like a much better idea.

gopls seems to be the main client of go packages, and it always uses the same configuration. Maybe we should optimize for that first? It doesn't require types, so it seems like the go builder could just output the json files by default. I'm wondering if the driver even needs to run bazel. I get annoyed when I have to wait for the editor's bazel commands to finish so I can use bazel from the command line, and I'm pretty sure I don't have enough memory to have two copies of bazel running.

I wrote a hacky library that can solve this https://github.com/nikunjy/golink for protobufs and can be extended wherever go_generated_srcs is there.

Hey @jmhodges, would you be willing to contribute your work on this so far? @tomlu is interested in continuing this, and your work would be a great starting point.

If you open a PR from the feature/gopackagesdriver branch in your fork, I can merge that onto a dev branch as-is, and we can take it from there. No need to update or rebase. That should keep our open source lawyercats happy.

Is there any work going on to make this a reality?

Sorry, no, not at the moment.

I guess Go modules and their replace statements can solve some of these issues. But still, it would be nice to have something that would work more nicely.

I'm hoping to have time to work on this next month.

@ribrdb This is not a small project. Please read this comment.

bobg commented

There is another, much simpler way: include generated code in the source tree. Then everything Just Works.

I'm aware that this is contrary to the Bazel philosophy, but I consider the idea that some code should exist only at build time to be a misguided antipattern. As the aphorism says, code - even generated code - is primarily for humans to read, and only secondarily for computers to execute. And as the various other kinds of tooling breakage shows, there's more to do with source code than simply compile it into a binary. It belongs in the source tree.

This does mean changes to rules_go. In particular it means that where rules_go presently generates code, it should validate generated code instead. Builds should fail if the source-tree copy of any generated code is out of date with respect to its upstream source (such as a .proto file). Ideally the same logic that validates generated code at bazel build time could also update it with the right bazel run command. I'm working on exactly this right now - it's the motivation behind this Slack question - but my skill with writing Bazel rules is low. It would make my life a lot easier if rules_go went this way, and it would obviate the many difficulties outlined in Jay's long comment.

@bobg Copying generated code back to the source tree is a fine workaround. For folks that need to interoperate with other build systems like go build that don't support code generation, it's necessary.

rules_go (and Bazel rules in general) cannot do that though. Output files are always declared in a separate tree. Actions may run on a different machine. Even when they're run locally, they're usually sandboxed to prevent that that kind of thing.

Generated files could be copied by a tool executed with bazel run, but that doesn't ned to be part of rules_go.

I'm using generated_file_test from rules_nodejs which takes two files, one under version control, and another that's generated by some arbitrary rule, and generates a test target that validates the two match. It also generates a ".update" target which updates the VCS file.

bobg commented

Ooh, that looks super handy, thanks @duarten. What are the prospects for adding something like that to rules_go?

Another possibility would be to try and get generated_file_test into bazel-skylib or something.