Switching to Protobuf v2 breaks generation of K8S protos
hagmonk opened this issue · 15 comments
Commit e60b2f4 switched to Go Protobuf v2, but this breaks the K8S example.
Reverting to f561f12 does not exhibit the same panic.
> go run ./main.go TEST IT ./nginx.cfg
panic: field k8s.io.api.core.v1.ContainerPort.name has invalid type: got string, want pointer
goroutine 1 [running]:
google.golang.org/protobuf/internal/impl.fieldInfoForScalar(0x2069a40, 0xc000195900, 0x1d31f8b, 0x4, 0x0, 0x0, 0x2069380, 0x1d526c0, 0x1d31f91, 0x36, ...)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect_field.go:228 +0x7d7
google.golang.org/protobuf/internal/impl.(*MessageInfo).makeKnownFieldsFunc(0xc000478420, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, ...)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect.go:74 +0x95d
google.golang.org/protobuf/internal/impl.(*MessageInfo).makeReflectFuncs(0xc000478420, 0x2069380, 0x1e3fa20, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, 0xffffffffffffffff, ...)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect.go:42 +0x65
google.golang.org/protobuf/internal/impl.(*MessageInfo).initOnce(0xc000478420)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message.go:91 +0x185
google.golang.org/protobuf/internal/impl.(*MessageInfo).init(...)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message.go:73
google.golang.org/protobuf/internal/impl.(*messageReflectWrapper).ProtoMethods(0xc00033b2c0, 0xc000309a80)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/internal/impl/message_reflect_gen.go:150 +0x53
google.golang.org/protobuf/proto.protoMethods(...)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/proto/proto_methods.go:18
google.golang.org/protobuf/proto.mergeOptions.mergeMessage(0x2063d60, 0xc00033b2c0, 0x2063d60, 0xc00033b2b0)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/proto/merge.go:67 +0x3b
google.golang.org/protobuf/proto.Clone(0x20250a0, 0xc00033b2b0, 0x20250a0, 0xc00033b2b0)
/Users/lburton/pkg/mod/google.golang.org/protobuf@v1.25.1-0.20201016220047-aa45c4675289/proto/merge.go:58 +0xad
github.com/golang/protobuf/proto.Clone(0x74d19f0, 0xc0002d9300, 0x0, 0x0)
/Users/lburton/pkg/mod/github.com/golang/protobuf@v1.4.1/proto/proto.go:130 +0x5d
github.com/stripe/skycfg/internal/go/skycfg.(*skyProtoMessageType).CallInternal(0xc0002d9340, 0xc0003e1bc0, 0x0, 0x0, 0x0, 0xc0001287a0, 0x1, 0x1, 0x0, 0xc000080900, ...)
/Users/lburton/src/github.com/stripe/skycfg/internal/go/skycfg/proto_message_type.go:155 +0xdb
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20576c0, 0xc0002d9340, 0x0, 0x0, 0x0, 0xc0001287a0, 0x1, 0x1, 0x0, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
go.starlark.net/starlark.(*Function).CallInternal(0xc000400bc0, 0xc0003e1bc0, 0xc000197810, 0x1, 0x5, 0x0, 0x0, 0x0, 0xc0002bee00, 0x0, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/interp.go:325 +0x3e8e
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20577c0, 0xc000400bc0, 0xc000197810, 0x1, 0x5, 0x0, 0x0, 0x0, 0x2057680, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
go.starlark.net/starlark.(*Function).CallInternal(0xc000400c00, 0xc0003e1bc0, 0xc0003f8680, 0x1, 0x1, 0x0, 0x0, 0x0, 0x1, 0x273c760, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/interp.go:325 +0x3e8e
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20577c0, 0xc000400c00, 0xc0003f8680, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
go.starlark.net/starlark.(*Function).CallInternal(0xc000400c40, 0xc0003e1bc0, 0xc000402140, 0x1, 0x1, 0x0, 0x0, 0x0, 0x1dd1b60, 0x1e75740, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/interp.go:325 +0x3e8e
go.starlark.net/starlark.Call(0xc0003e1bc0, 0x20577c0, 0xc000400c40, 0xc000402140, 0x1, 0x1, 0x0, 0x0, 0x0, 0x30, ...)
/Users/lburton/pkg/mod/go.starlark.net@v0.0.0-20201204201740-42d4f566359b/starlark/eval.go:1191 +0x1a5
github.com/stripe/skycfg.(*Config).Main(0xc000400c80, 0x2052500, 0xc000130000, 0x0, 0x0, 0x0, 0x1, 0xc000400c80, 0x0, 0x0, ...)
/Users/lburton/src/github.com/stripe/skycfg/skycfg.go:358 +0x3dd
main.main()
/Users/lburton/src/github.com/stripe/skycfg/_examples/k8s/main.go:298 +0x565
exit status 2
This isn't something Skycfg can fix directly, but when the Protobuf module finishes migrating to the go-protobuf v2 API we should have some better options.
Summary of the issue:
- Kubernetes has checked in Go structs that resemble go-protobuf generated code, but which do not conform to the go-protobuf API.
- Previous versions of go-protobuf happened to work with the Kubernetes structs because the particular API violations weren't relevant to that code path.
- Newer versions of go-protobuf change how they do some reflection-based operations, which breaks on the Kubernetes structs.
Potential workarounds:
- Avoid upgrading go-protobuf to v2 in your project -- this includes avoiding upgrades of Skycfg to a version that depends on go-protobuf v2.
- Use a build system like Bazel to generate correct Go code for the Kubernetes protobufs, rather than rely on upstream's broken Go code.
In the future, Skycfg will be switching from using generated Go structs to the dynamicpb
package. This will remove any dependency on the Kubernetes structs, and it will be possible to operate on any Protobuf that a descriptor can be obtained for.
Thanks for the quick reply! Makes sense!
Use a build system like Bazel to generate correct Go code for the Kubernetes protobufs, rather than rely on upstream's broken Go code.
I actually maintain a set of Bazel rules (that I'm hoping to have released OSS one day) called rules_skycfg :) The summary is that we generate all our K8S YAML using skycfg, and have those YAML generating rules as part of the Bazel dependency graph so we can reason about them the same as we do with source code.
One challenge I've had with generating protos for K8S is that gazelle is apparently not smart enough to connect proto imports across different repos. So, it generates a bunch of BUILD files that depend on //k8s.io/apimachinery/etc
instead of @io_k8s_apimachinery//etc
for instance. Hence I've been running with build_file_proto_mode = "disable_global"
for @io_k8s_api
and @io_k8s_apimachinery
.
Is there any trick you can recommend to get proto generation working for K8S projects? The best idea I've found so far involves smothering my BUILD files with # gazelle:resolve
directives for every single proto import path which seems like it will quickly get laborious … if I can get over this hurdle I can stay up-to-date with skycfg :)
FYI, Jay as recently as October recommended simply disabling proto generation, which makes me wonder if this is particularly feasible … bazelbuild/bazel-gazelle#924 (comment).
Take a look at the # gazelle:proto_import_prefix
and # gazelle:proto_strip_import_prefix
Gazelle directives, which control the proto_library
attrs of the same name -- that functionality exists to support pretty much this exact use case (bazelbuild/bazel#3867).
If you're using go_repository()
then those directives can be placed into the build_directives
parameter.
I currently expect Skycfg to finish migrating to the dynamicpb
implementation by the end of December, at which point this should become simpler.
Awesome, thanks so much. If I can get it working I'll update this issue for posterity.
I think what I really want is that dynamicpb
implementation – if I'm guessing correctly, that would mean skycfg could operate on straight proto files without having to depend on generated code? That would simplify things dramatically :)
I know this isn't strictly skycfg related, but for the sake of people who stumble across this ... I kind of have proto generation for K8S APIs working.
I spent some time fiddling with gazelle, and it doesn't seem like those directives above solve the immediate problem. If I check out github.com/k8s/api and run gazelle -go_prefix k8s.io/api
, we end up with BUILD.bazel
files that have targets like this:
proto_library(
name = "v1_proto",
srcs = ["generated.proto"],
visibility = ["//visibility:public"],
deps = [
"//k8s.io/api/core/v1:v1_proto",
"//k8s.io/apimachinery/pkg/api/resource:resource_proto",
"//k8s.io/apimachinery/pkg/apis/meta/v1:v1_proto",
"//k8s.io/apimachinery/pkg/runtime:runtime_proto",
"//k8s.io/apimachinery/pkg/runtime/schema:schema_proto",
],
)
There is no such target //k8s.io/api/core/v1:v1_proto
. What it should be is //api/core/v1:v1_proto
. The go_library
rules it generates don't suffer from this problem. It seems like the resolution of imports to targets is b0rked for some reason. I skimmed through gazelle's source (trying not to go down a rabbit hole here) and tried a few different config options to influence the resolution and nothing had any effect whatsoever, except setting an explicit gazelle:resolve
directive.
So ...
for i in $(buildozer 'print deps' //...:%proto_library | tr -d '[]' | tr ' ' '\n' | sort | uniq); do
proto=$(sed -e 's%//%%' -e 's/:.*$/\/generated.proto/' <<<$i)
printf "gazelle:resolve proto %s %s\n" $proto $(sed -e 's%//k8s.io/api%@io_k8s_api%' -e 's%/%//%' <<<$i)
printf "gazelle:resolve proto go %s %s\n" $proto $(sed -e 's%//k8s.io/api%@io_k8s_api%' -e 's%/%//%' -e "s/:\(.*\)_proto$/:\1_go_proto/" <<<$i)
done
The output of which I tossed into my deps.bzl
file:
K8S_DIRECTIVES = [
"gazelle:resolve proto k8s.io/api/authentication/v1/generated.proto @io_k8s_api//authentication/v1:v1_proto",
"gazelle:resolve proto go k8s.io/api/authentication/v1/generated.proto @io_k8s_api//authentication/v1:v1_go_proto",
"gazelle:resolve proto k8s.io/api/batch/v1/generated.proto @io_k8s_api//batch/v1:v1_proto",
"gazelle:resolve proto go k8s.io/api/batch/v1/generated.proto @io_k8s_api//batch/v1:v1_go_proto",
"gazelle:resolve proto k8s.io/api/core/v1/generated.proto @io_k8s_api//core/v1:v1_proto",
"gazelle:resolve proto go k8s.io/api/core/v1/generated.proto @io_k8s_api//core/v1:v1_go_proto",
"gazelle:resolve proto k8s.io/apimachinery/pkg/api/resource/generated.proto @io_k8s_apimachinery//pkg/api/resource:resource_proto",
"gazelle:resolve proto go k8s.io/apimachinery/pkg/api/resource/generated.proto @io_k8s_apimachinery//pkg/api/resource:resource_go_proto",
"gazelle:resolve proto k8s.io/apimachinery/pkg/apis/meta/v1/generated.proto @io_k8s_apimachinery//pkg/apis/meta/v1:v1_proto",
"gazelle:resolve proto go k8s.io/apimachinery/pkg/apis/meta/v1/generated.proto @io_k8s_apimachinery//pkg/apis/meta/v1:v1_go_proto",
"gazelle:resolve proto k8s.io/apimachinery/pkg/runtime/schema/generated.proto @io_k8s_apimachinery//pkg/runtime/schema:schema_proto",
"gazelle:resolve proto go k8s.io/apimachinery/pkg/runtime/schema/generated.proto @io_k8s_apimachinery//pkg/runtime/schema:schema_go_proto",
"gazelle:resolve proto k8s.io/apimachinery/pkg/runtime/generated.proto @io_k8s_apimachinery//pkg/runtime:runtime_proto",
"gazelle:resolve proto go k8s.io/apimachinery/pkg/runtime/generated.proto @io_k8s_apimachinery//pkg/runtime:runtime_go_proto",
"gazelle:resolve proto k8s.io/apimachinery/pkg/util/intstr/generated.proto @io_k8s_apimachinery//pkg/util/intstr:intstr_proto",
"gazelle:resolve proto go k8s.io/apimachinery/pkg/util/intstr/generated.proto @io_k8s_apimachinery//pkg/util/intstr:intstr_go_proto",
]
At which point I did require the gazelle:proto_import_prefix
:
go_repository(
name = "io_k8s_api",
build_directives = K8S_DIRECTIVES + ["gazelle:proto_import_prefix k8s.io/api"], # keep
importpath = "k8s.io/api",
sum = "h1:ojgIGmjzUSwsug8H2yVCoueRAGy0IshvrtowuLycEQo=",
version = "v0.0.0-20181027024800-9fcf73cc980b",
)
go_repository(
name = "io_k8s_apimachinery",
build_directives = K8S_DIRECTIVES + ["gazelle:proto_import_prefix k8s.io/apimachinery"], # keep
importpath = "k8s.io/apimachinery",
sum = "h1:QV0MJn7lF87qcyC3Y+On2zKM62Erf99uoORoQbu7lag=",
version = "v0.0.0-20181026144617-b7f9f1fa80ae",
)
And where this has finally brought me is this error:
external/io_k8s_apimachinery/pkg/util/intstr/intstr.go:41:6: IntOrString redeclared in this block
previous declaration at external/io_k8s_apimachinery/pkg/util/intstr/intstr_go_proto_/k8s.io/apimachinery/pkg/util/intstr/generated.pb.go:28:6
Which seems to be the result of somehow getting @io_k8s_apimachinery//pkg/util/intstr:intstr
and @io_k8s_apimachinery//pkg/util/intstr:intstr_go_proto
both in my dependency graph. I'm guessing I probably need another rewrite rule to make things depend only on the intstr_go_proto
version. But who knows what dreadful hacks depend on that Go file, lol.
I'll keep poking at this, but hopefully this is useful to someone out there.
Two more resolve overrides fixes things up:
"gazelle:resolve go k8s.io/apimachinery/pkg/util/intstr @io_k8s_apimachinery//pkg/util/intstr:intstr_go_proto",
"gazelle:resolve go k8s.io/apimachinery/pkg/api/resource @io_k8s_apimachinery//pkg/api/resource:resource_go_proto",
Then, you have to be careful that your BUILD.bazel file has directives to make sure you link against the Bazel-generated Go libraries, not the checked-in ones:
# gazelle:resolve go k8s.io/api/apps/v1 @io_k8s_api//apps/v1:v1_go_proto
# gazelle:resolve go k8s.io/api/batch/v1 @io_k8s_api//batch/v1:v1_go_proto
# gazelle:resolve go k8s.io/api/core/v1 @io_k8s_api//core/v1:v1_go_proto
# gazelle:resolve go k8s.io/api/storage/v1 @io_k8s_api//storage/v1:v1_go_proto
# gazelle:resolve go k8s.io/apimachinery/pkg/apis/meta/v1 @io_k8s_apimachinery//pkg/apis/meta/v1:v1_go_proto
# gazelle:resolve go k8s.io/apimachinery/pkg/runtime @io_k8s_apimachinery//pkg/runtime:runtime_go_proto
So that gets you something. But it turns out the checked-in Go source does more for you than just exist. This skycfg program will now fail with can't assign to .name field of NoneType
def main(ctx):
d = appsv1.Deployment()
d.metadata.name = "hey"
return [d]
Basically it looks like you need to construct every single object?
def main(ctx):
d = appsv1.Deployment()
m = metav1.ObjectMeta()
m.name = "hey"
d.metadata = m
return [d]
With several thousand lines of skycfg I'm not super keen on having to rewrite most of it with explicit constructors ... will this also be an issue after switching to reflection?
Another problem is I can no longer import k8s.io.apimachinery/pkg/runtime
due to a conflict with the generated proto. I depend on for outputting human-readable YAML to make it easier for users during development and debug. I could write my own serdes to solve this but it's convenient to have kubectl
output parity. But in general, if you pull in k8s/client-go
, or want to use schema.GroupVersionResource
you're immediately going to hit the problem. It might be possible to separate the Bazel-generated proto into its own importpath / package? I just doubt that such objects will be accepted by all the existing tooling.
Well that was a fun exercise. I think for the time being I'll stick with the earlier skycfg versions without the v2 proto dependency. I'm totally at the limit of my proto knowledge here, so I look forward to learning what I'm getting wrong ;)
Due to some unrelated circumstances, I do not think I'll be able to finish this work by the end of the year -- and there is now no ETA.
I can review PRs that finish migrating the proto
module to go-protobuf v2, or alternatively, I can roll back the migration and return current HEAD to go-protobuf v1.
Do the followers of this issue have any preference as to the above choice?
I have a version of the proto
module that I'd guess is around 40-50% done -- it handles packages, message types, and enums. Will try to push it to github.com so it can be built upon.
Messages themselves are the missing big piece, plus figuring out a solution to:
- The serialization of Kubernetes YAML, which ought to be backed by the
k8s.io
packages if they're available. - The unexpected change to auto-vivified fields @hagmonk described above. The intended behavior is that fields do not auto-vivify (so that
foo.bar == None
works), but if Kubernetes' custom structs were previously bypassing that then we need at least an option for maintaining backwards compatibility.
The finished parts of the go-protobuf v2 implementation for message reflection have been pushed to trunk
-- they're not wired up to anything, but do have tests based on the existing internal/go/proto_test.go
test suite.
Placeholders for the unfinished part, the protoMessage
implementation, have been pushed to a branch at https://github.com/stripe/skycfg/compare/jmillikin_protobuf-api-v2?expand=1
Compatibility note: I just tagged commit f561f12 (right before the go-protobuf v2 switch) as v0.1.0
. This should allow existing users to run Go module updates without worrying about accidentally picking up the new dependency.
The next tag will happen after the migration has finished and Kubernetes protos have been verified to work with go-protobuf v2.
Are there any updates on this issue with the new v2 improvements that have been merged since?
Are there any updates on this issue with the new v2 improvements that have been merged since?
Ah I see now, for anyone else that is slow like me and still catching up on this, at this point we're just waiting for Kubernetes to migrate away from gogo so they support the new proto registry: kubernetes/kubernetes#96564