protoc-gen-go: support go_tag option to specify custom struct tags
jedborovik opened this issue · 92 comments
Is it possible to define a message
with custom tags? For example defining a json name that isn't just the lowercase field name or adding a bson tag.
No, there's no support for that, nor any intention to support that.
I'd like to add that this feature would be extremely useful for validation purposes.
+1
Use case: Using protobuf generated structs along with sqlx package. Would be awesome
Likewise the go library for cloud datastore. Basically, increasing numbers of things make use of struct tags, and without support for them you're left manually duplicating structs? If there are no plans to add this functionality, what is the recommended way to handle it?
I would love this feature!
This seems to somehow mitigate the need
I would love this feature aswell!
This would be handy, as I need to validate JSON payloads for my REST API portion of the service.
Since golang/protobuf will probably never support this. Here are some solutions.
https://github.com/mwitkow/go-proto-validators is useful for validation on proto structs.
https://github.com/gogo/protobuf can be used in two ways to have custom tags:
-
You can use the moretags extension
https://github.com/gogo/protobuf/blob/master/test/tags/tags.proto -
Or you can use the typedecl extension if you don't want to generate the golang struct.
This allows you declare your own golang struct with all the tags you want and more.
Need it as well to automate insert/update into the sql database
I've add a plugin retag for protoc-gen-go.
https://github.com/qianlnk/protobuf/tree/master/protoc-gen-go/retag
protoc --go_out=plugins=grpc+retag:. yourproto.proto
@qianlnk .thanks.I tried your plugin when i use only message then ther is no problem but when the message have sub message like this
message SendData {
message MetaData{
string Name = 1;//`json:"name;omitempty"`
int64 Length = 2;//`json:"length;omitempty"`
}
message Chunck{
bytes Data = 1;//`json:"data;omitempty"`
int64 Position = 2;//`json:"position;omitempty"`
}
}
This throws the error.Any how the retag can support this???
Thanks...
@RajeshKumar1990 I fixed it.
@qianlnk I am having a problem making this work.... does it support proto3?
syntax = "proto3";
package contracts;
import "Common.proto";
//import "google/protobuf/timestamp.proto";
message ZZZ {
string id = 1; // `db:"id"`
string clientId = 2;
string datetime = 3;
}
Converting with:
protoc -I=xyz.eu/contracts/proto --go_out=plugins=grpc+retag:xyz.eu/contracts/ xyz.eu/contracts/proto/*.proto
Result:
type ZZZ struct {
Id string `protobuf:"bytes,1,opt,name=id" json:"id,omitempty"`
ClientId string `protobuf:"bytes,2,opt,name=clientId" json:"clientId,omitempty"`
Datetime string `protobuf:"bytes,5,opt,name=datetime" json:"datetime,omitempty"`
}
The db custom tag on id is missing....
Like everyone using proto, I need this in order to work with (any) orm:)
Thanks for your time
@qianlnk thank you for the response. I changed the proto to:
string ID = 1; // db:"id,omitempty"
In above and below example the ` before db and at the end of line are omitted by github... but they are there. Unlike your example, only I am using a non existing tag like (no json or xml).
Still no db output:
ID string protobuf:"bytes,1,opt,name=ID" json:"ID,omitempty"
If you have another idea please let me know. I am looking into the plugin code right now.
I tested that the code is called with a print statement in de init. So the environment seems ok.
@atamgp have you install my retag plugin?
git clone https://github.com/qianlnk/protobuf.git $GOPATH/src/github.com/golang/protobuf
go install $GOPATH/src/github.com/golang/protobuf/protoc-gen-go
this is my try:
syntax = "proto3";
package contracts;
//import "Common.proto";
//import "google/protobuf/timestamp.proto";
message ZZZ {
string ID = 1; //`db:"id"`
string ClientID = 2; //`db:"client_id"`
string Datetime = 3; //`db:"datetime"`
}
run:
protoc -I/usr/local/include -I. \
-I$GOPATH/src \
--go_out=plugins=grpc+retag:. \
test.proto
and result:
type ZZZ struct {
ID string ` protobuf:"bytes,1,opt,name=ID" json:"ID,omitempty" db:"id"`
ClientID string ` protobuf:"bytes,2,opt,name=ClientID" json:"ClientID,omitempty" db:"client_id"`
Datetime string ` protobuf:"bytes,3,opt,name=Datetime" json:"Datetime,omitempty" db:"datetime"`
}
@qianlnk OK, I have it figured out a bit more :)
message ZZZZ {
string ID = 1; //db:"id"
string clientID = 2; //db:"id"
double amount = 3;
}
Things I learned:
- I have to put the db tag on all fields of a message. Otherwise I get an exception:
panic: runtime error: index out of range (panic: runtime error: index out of range -> indexing [1])
For some fields it is not wanted to be mapped by an ORM. Is it possible to make this optional? - Like you said earlier, field name has to begin with a capital otherwise its ignored. Ofcourse its possible to misuse this aspect to realize optional fields (1) but is less clean.
- Command path used. I have this folder structure:
$GOPATH/src/X/Y/Z/zzz.proto
I was using a protoc command from src folder which did not work:
Working: When calling from within Z: protoc --go_out=plugins=grpc+retag:. zzz.proto
Not working: protoc -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/zzz.proto
Working: protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/zzz.proto
4) importing from other proto: I have a common.proto and a zzz.proto. zzz is importing an enum and a message from common.
working: if a use a working command from point 3) above for 1 file at a time e.g.:
protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/zzz.proto
protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/common.proto
not working: same commands which are working for 1 file but in order to process both proto files /*.proto instead of a specific proto. E.g.: protoc -I:. -I:X/Y/Z --go_out=plugins=grpc+retag:. X/Y/Z/
*.proto
errors: .... is already defined in file ....
It is really nice to process all proto files in 1 time, can you have a look at this?
Last: It is not clean to have all proto files and generated go files in the same folder.
In my case Z is actually named proto and Y is contracts.
I want to have my generated go files put in folder Y instead of Z. Still looking for a working combination command for this...
Has anyone found a solution for converting between type: string and type: bson.ObjectId as well?
Please support this it is extremely useful and cuts down on needless code bloat.
While this idea is super useful in Go, unfortunately no other language that proto supports can really make use of it. As such, changing/supporting this would be outside of the scope of the official Go protobuf package.
@puellanivis, that is not true. Supporting custom tags would open protobufs to a plethora of possible tooling.
I'm going to re-open this issue, but it does not mean that we're going to support this. I'd like to see more discussion on the feasibility of this. Some thoughts:
- I can see the usefulness of this feature for Go-heavy projects, but protobufs were designed a language-agnostic way to interchange data. Baking more Go-specific details in the proto files seems to be antithetical to the philosophy of protobufs.
- This features makes the assumptions that generated Go protobufs are always structs with fields. What does it mean if we want the ability to generate opaque protobuf types that are interacted via methods. Performance testing has shown that techniques like lazy unmarshaling can bring significant speed benefits, but is only feasible as an opaque type.
- The proper way to support this seems to be using protobuf options, but I don't see any proposal here suggesting a syntax for how that would work.
why not add a plugin like
https://github.com/qianlnk/protobuf/tree/master/protoc-gen-go/retag
This plugin is my original idea, don't take too much effort to attain it.But I believe that many people need it, so I hope more people can join in.
@qianlnk Looked at your retag
plugin as one of several alternatives. I don't really like the idea of cloning it into my golang/protobuf
folder, especially since This branch is 8 commits ahead, 28 commits behind golang:master.
Would you consider making it a standalone plugin instead?
@dsnet See tags.proto which was referred to by @awalterschulze as an example of using the protobuf options approach.
It could perhaps look something like this:
message Enrollment {
uint64 CourseID = 1 [(go_tags) = "gorm:\"unique_index:uid_user_course\""];
uint64 UserID = 2 [(go_tags) = "gorm:\"unique_index:uid_user_course\""];
int32 foo = 3 [(go_tags) = "sql:\"type:text\""];
string bar = 4;
}
This would be a very useful addition to avoid a lot of boilerplate, and IMO it would fit nicely in with other language-specific options, such as e.g. java_package
to name just one. Just like java_package
is ignored when generating Go files, go_tags
would be ignored when generating non-Go code.
The only issue that I don't like with the above is the need for the \"
. It would be nice if that can be avoided, but it is a compromise I don't mind if we can get this feature into the main project.
@dsnet Regarding your point about opaque types. I think there are two main use cases that have come up in this issue.
- Boilerplate avoidance for database/storage backends.
- Validation of input fields.
I think the first is best dealt with using something like my proposed go_tags
, because the tools that process the tags are their own beasts and it seems difficult to interact via lazy functions.
Validation of input fields seems different though. I can imagine that validation could be done as part of lazy unmarshaling, but that would require another option such as in go-proto-validators. However, such lazy unmarshaling with validation is not Go-specific, and so I don't think it makes sense to put such validation data into struct tags in the generated code. Indeed, go-proto-validators does not put validation data in struct tags.
IMO, it's better to provide a global option, for example go_tags
with value like "bson"
or "bson,xml"
.
It could perhaps look something like this:
syntax="proto3";
package user;
option go_package = "user";
option go_tags = "bson";
message User {
string id = 1;
string name = 2;
}
Which generates:
type User struct {
Id string `protobuf:"varint,1,opt,name=id" json:"id,omitempty" bson:"id,omitempty"`
Name string `protobuf:"varint,1,opt,name=name" json:"name,omitempty" bson:"name,omitempty"`
}
protoc-gen-go
generates json
by default, but sometime we need others tags for marshal/unmarshal. As a result, I have to write another struct the same as generated except tags.
A global option may work for your use case with bson
, but many other use cases require per field options.
@meling I don't thinking it's a good idea to insert more info to every field option. protoc-gen-go
bind struct with json, but we use struct in many cases, such as bson
and xml
and some custom cases. The global option mentioned above is to solve the problem, which is not only in my case.
To database/storage, for example gorm
, use tags to define index/unique_index
is a crazy action. We use protobuf
with many languages, option for every field is useless for python/java. These options make the .proto
unreadable for python/java developers. What's more, option for every field makes the .proto
file hard to read and modify, for there are many service logics. Simple is better.
Why is it crazy to define index
fields using standard protobuf options syntax? I didn’t design the option syntax for protobuf, but I learn to adapt. Just because something is unreadable to some people doesn’t mean it is crazy. What is crazy is to do “manual” conversion between a protobuf struct and a database struct when they are otherwise identical, because that is just error prone... and slow.
Of course your approach is simple, but it doesn’t solve the problems of at least half of the people asking for this feature...
A file-level option for tags is off the table. It requires protoc-gen-go
to have a semantic understand of all the tag formats that users may want, thus coupling a variety of struct tag formats into the proto repo. Individual field options would be the way to go since they are more extensible and does not bake package-specific logic into the generator.
👎
I abandoned my efforts on this since I saw the posts. So now I keep separate models (with JSON mapping to do the boilerplate) for protobuf generated structs ("Remote" structs, as I suffixed them now). A bit more code and another layer of indirection, but it works and has been in production for 4 months now.
@willks From your first comment it seems like you need to do validation. I wonder if you could provide some more details about your use case, e.g. an example. In my earlier reasoning I found that adding validation data to struct tags may not be the best approach, since ideally you would want to have such validation data (e.g. upper/lower bound on an integer value) encoded such that the protoc
compiler can generate validation code for all supported languages, not just Go. If validation is the main purpose, perhaps you could make the case to add such features to the top-level protobuf compilers (I have no idea what the politics would be or if something like this has already been proposed...)
However, avoiding boilerplate code to translate proto messages that map identically to a database/storage model, that may need some additional go tags for the storage model, I do think making use of a go_tags
option makes sense. My interpretation of @dsnet is that the project is open to accepting something along these lines... I have been thinking about writing up a patch myself, but haven't found the time, and the project where I need this is currently on hold.
I would encourage more people to submit examples of how they imagine that they would use such custom options to generate custom Go tags.
To start this off, let me point to our Autograder project where we want to use this: ag_service.proto. We tried using gogoproto.moretags
, but there are some compatibility issues with the grpc-web frontend codebase in typescript that remains unresolved.
RE validation: I would like to note one of the reasons why the required/optional distinction was eliminated was because of a considered position within (and without?) Google that validation should not be a part of the protobuf process itself, but rather consumers of a protobuf should be validating protobufs themselves.
Boilerplate validation is convenient for quickly throwing validation into something, but in the end, the inflexibility results in difficult to maintain, if not unmaintainable code. (The second one wishes to do anything other than just reject the protobuf outright, a complete rewrite must be performed, and oftentimes, removing the baked-in validation from the protobuf is an impossible, due to infrastructure being built upon the assumption.)
for me the major benefit of this feature would be model binding: i have a REST service implemented in go (with gin-gonic), the body is encoded in JSON and can be mapped to a go-struct using the json-annotations.
now this REST server is actually just a gateway to embedded devices which use ProtocolBuffers for the de-/encoding of messages. without any possibility to manipulate the annotations generated by protoc i have to bind the full message 1:1 to the (JSON) REST service.
implementing, e.g., a message with some field holding internal data (only used by the gateway and the device but not exposed by REST) is not possible without either manipulating the generated code OR duplicating each and every message this gateway sends through > which defeats the whole point of using ProtocolBuffers.
more likely use-case: the API is defined externally. someone suddenly decides that CamelCase is a good option instead of underscores > now instead of simply modifying the JSON annotations i'd have to either re-generate and thus re-program all devices or introduce boilerplate (mapping)
@meling You are right, this is not something that protobufs/gRPC needs to be concerned about. Protobufs/gRPC is the transport mechanism, it's how we convert it to the relevant domain model that we need to be concerned about. Hope that makes sense :) I swapped to Java for my server side since my last post, consolidating of the logic based on generics now. My validation is made on the "mapped" objects in Java to return the relevant errors.
My clients are both iOS/Android. It has been addressed so I take back my previous post :) Thanks for replying.
Here's a formal proposal.
Background
The protoc
generators for Go protobufs have historically generated proto messages as structs with fields since 2009. The major benefit of struct with exported fields is that it produces more idiomatic Go code and is generally nicer to use for users. However, at the same time, users were never given control of struct field tags, which are arguably an important aspect of how Go structs can interact with the rest of the ecosystem (e.g., encoding/xml
, gopkg.in/mgo.v2/bson
, etc).
Proposal
I propose we add support for a go_tag
proto option that enables the user to specify the struct tag value for a field.
Example:
syntax = "proto3";
package example.gotag;
message Foo {
string Field1 = 1 [go_tag = "xml:\",chardata\""]
oneof Field2 {
option go_tag = "xml:\",chardata\""
string Field2a = 2;
string Field2b = 3;
}
}
The specifics:
-
This option only affects
FieldOptions
andOneOfOptions
and is ignored when the option is specified on any other option dimension. -
Only generators that emit messages as structs with exported fields need to respect
go_tag
. Those that don't can ignore it. -
The syntax of
go_tag
is any arbitrary string. It is recommended (but not required) that the string follows the tag convention used by nearly every Go package. The value of this tag is concatenated to any tags already generated by default. Forprotoc-gen-go
, we already generateprotobuf
andjson
tags, which we should phase out over time (see points 4 and 5). -
For
protoc-gen-go
, phase out generation ofprotobuf
struct tags.
The information held in theprotobuf
andprotobuf_xxx
tags are implementation details that should stop leaking out to the public API. The work to makeMessage
behaviorally complete (#364) will provide the proper API to supply the proto type information that was originally encoded in theprotobuf
tags. -
For
protoc-gen-go
, phase out generation ofjson
struct tags.
Thejson
tags were an half-hearted attempt at satisfying the JSON<->Protobuf mapping. However, theencoding/json
package is fundamentally insufficient to properly satisfy the mapping for all features of protobufs (e.g., well-known types, required fields, etc). Rather than provide the illusion that we properly satisfy the specification (when we don't), we should drop support for the mapping entirely. If users desires for theencoding/json
package to be equivalent to the JSON<->Protobuf mapping (then see #256). If users desire to directly control howencoding/json
interacts with generated messages, then they can use thego_tag
option to control the tags.
Item 5 is a breaking change. See #526 for discussion on how we can evolve the generator.
Use cases
This feature enables full control of struct tags by the user in a simple way.
- This supersedes the
jsontag
andmoretags
options in gogo/protobuf.
Design specifics
What happens if the go_tag
contains the json
tag?
For the near future, users should avoid doing this since it will conflict the json
tag already generated by default. It is up to each generator implementation whether to explicitly ensure that the json
tag is not set to avoid a conflict.
What happens if go_tag
is specified on fields belonging to a oneof?
A generator may ignore it if the fields of a one-of are not represented as a struct (as is the case for protoc-gen-go
). However, if a generator does represent a oneof
as a flat struct with nullable fields, they may choose to place the tag on the corresponding field.
What happens if go_tag
is specified on an extension?
The generator should error in this case. It is impossible for extensions to be added to a struct at proto compile time since extended fields can be added by any proto file.
What about generating "opaque" messages?
There have been some proposals to possibly change the layout to support other generated APIs that are more conducive to lazy unmarshaling, where only a small set of fields are accessed. Such APIs would avoid being struct structs with fields, but instead enable access to fields via setter/getter methods. In such a case, struct tags have no semantic meaning.
At this point, it is probably reasonable to assume that the specific API generated by protoc-gen-go
will always be struct with fields. Support for different APIs should be performed by an entirely different generator implementation. Such generators that do not emit messages as a structs with fields should just ignore the go_tag
option.
Total bikeshed: go_struct_tag
.
Loving this proposal.
This is one of the harder ones because of the json tag already being generated and the backwards compatibility problems that it might cause. And then also adding the confusion of jsonpb vs encoding/json.
I am optimistic that this proposal helps in both regards.
A possible solution for number 5 would be to create protoc-gen-gov2
that does not generate any json tags and break backwards compatibility.
Then we can add a preprocessing step to protoc-gen-go
that ads all the json tags as new go_tag
options. That way the generator library code only needs to maintain the newest version.
If we semi controversially live at head
then we can also provide a tool to upgrade .proto files to include the jsontags inside go_tag
options, for users who want to keep this feature.
But thats just some ideas for number 5.
Overall I think this proposal looks awesome.
@dsnet Thank you very much for taking the time to phrase this awesome proposal. I completely agree with you that it would probably be better to drop the auto generated json tags.
However, to me, the readability of tags as protobuf options does not seem to be very good. This may lead to errors. Additionally it is actually quite cumbersome to write tags this way – which I would say does not align very well with the go spirit.
Therefore, another approach could be to keep the go struct tag information completely separate from the proto definition – i.e. in a adjacent .go file. This information could then be used to post process the .pb.go files and apply the go tag definitions to the generated structs. This would avoid putting non transport related information into the proto files while still allowing us to define custom go struct tags. We could even go a step further and define custom functions that are merged onto the generated structs.
mytype.proto, mytype-tags.go -> mytype.pb.go (with tags applied from mytype-tags.go)
For now, we wrote a small utility that allows us to use comments to specify tags. We then post process the .pb.go files and apply the comments as go struct tags using the go abstract syntax tree.
A minor inconvenience is that the protoc grpc plugin will not keep same line comments, so the tag comments must be placed above the field definition in the proto file.
The utility and a complete example can be found here: https://github.com/dkfbasel/protobuf
There was a very nice json library in Java that is worth looking at for some ideas
https://github.com/pascallouisperez/jsonmarshaller/blob/wiki/Options.md
Java's annotations are imo (one of the only things) better than Go's tags.
The jsonmarshaller library allowed for different views, eg different names based on a view name.
Potentially issue 5 (PB <> JSON) should be solved by a library that ignores that json tags on the generated protobufs.
Here is another simple solution
@dsnet the go_tag
is useless for other language. And actually, this repo can only generate golang code.
Yes, this repo contains the logic to generate .pb.go
files. However, a .proto
file is ingested by generators that produce source code for other languages. Magical comments is never going to fly compared to a proto option like go_tag
or something else.
@lvht Thanks for your input on this. Let me clarify some other concerns I have with your proposal, beyond our lack of enthusiasm for magical comments in proto files.
As I pointed out here, I think its a bad idea to use Go's struct tags to express data for validation functions. This is because you would either need to use reflection in the Validate()
function to extract the data from these struct tags (slow), or use another generator to generate the Validate()
function (pointless).
Moreover, you would also need to define a "validator language" that can then be translated into code for the Validate()
function (as in go-proto-validators). Note that in go-proto-validators, no validation data is stored in Go's struct tags. This seems like something that should be its own effort and discussed elsewhere, and be part of the protobuf language, or a separate language for validation. As a side note, I don't particularly care for notation such as lt
and gt
etc.
As a simpler solution, I could imagine a command line flag that adds a call to a user-defined Validate()
function that gets called after unmarshaling to check message fields. Or you could probably avoid a command line flag by defining a boolean for each message type that is checked before calling Validate()
as outlined below.
In .pb.go
:
var innerMessageHasValidate bool
func (m *InnerMessage) Unmarshal(dAtA []byte) error {
// unmarshal message into m's fields
if innerMessageHasValidate {
return m.Validate()
}
return nil
}
In a user-defined validate.go
file:
func init() {
innerMessageHasValidate = true
}
func (m *InnerMessage) Validate() error {
// user-defined validation
}
Such Validate()
functions must then be defined in a separate .go
file and placed in the same folder as the .pb.go
file (or similar for other languages). This would allow full language flexibility in how to define the validation, and wouldn't need its own "validator language". Of course, the drawback is that you would need to write the same Validate()
logic for each language, which would be error prone. So there is definitely a case to be made for a "validator language", but I don't think this is the place to address that. Either way, I think the above structure with a Validate()
function can be made compatible with a future validation generator, based on some "validator language".
Beyond the comment that one should just use interface implementation assertion to test for innerMessageHasValidate
, yeah. Validation is actually a Hard Problem ™️ and beyond the most simple of cases would require a whole coding language in order to really properly handle validation. But then as well, there is the tendency of validation to go awry.
Internally at Google, the idea of validation of proto values internal to the proto library itself (even required values) was phased out precisely because it breaks forwards and backwards compatibility of the protobuf library itself. If validation is baked into the protobuf marshal or unmarshal itself, then it must now forever be a part of the protobuf definition. As an older user of the protobuf with the older validation may be a valid receiver of the protobuf, and now it would break on what is now valid input.
As such, the idea and scope of protobuf was centralized towards storing data in an efficient format that will provide some assurance of forwards and backwards compatibility (unless you broke your own protobufs, by for example reusing an obsoleted field tag). If one extends protobufs with validations, then there is no way to get around those validations, even when doing so is appropriate. So, it is far better to put validation explicitly in code (where it can arbitrarily check things as necessary, and update as it will without endangering the data storage itself), and where one can ensure that it is only checked on one-side. (Validating an out-going protobuf is essentially pointless, as the receiver cannot trust the encoder‘s validations.)
I would present to you a protobuf that defines a field for an email address, and uses a validation regex that had an error in it, causing valid emails to be rejected. Then, an update to the protobuf definition is made, and now the client can encode previously rejected emails, but the server still fails to unmarshal, or vice versa.
In order to remain maximum future compatibility, the protobuf definition cannot ever become any more permissive than the least permissive version of the protobuf. Which is why once a field is required
, it becomes a “must always and forever appear”, and thus why it was removed for proto3.
Thanks @puellanivis for your clarifying comments. I thought about interface implementation assertion too, shortly after posting my comment, but I wonder if that is a more costly test to perform in the critical path of the unmarshal function.
Anyway, I agree that it does not make sense to do validation in the marshal function. I totally see your points about compatibility, but there is also security/trust concerns with the data one receives. The question is how to balance these two concerns. Seems to me that the best approach (for now at least) is to do validation in each language implementation. I think it would be rare that one would need to implement multiple validation functions in different languages.
@dsnet @puellanivis does it make sense to write this up as a proposal in a separate issue, since this is really orthogonal to the go_tag
issue?
Let's keep this discussion about go_tag
and not about validation, which is beyond the scope of what this repo is responsible. There are other projects like https://github.com/lyft/protoc-gen-validate that attempt to solve the same problem.
Ok I am sorry, but getting all these messages about a validation language makes it really hard not to mention my own, especially now, since one has already been mentioned ;)
Here is a playground for Relapse, an encoding agnostic validation language
https://katydid.github.io/play/
It is super fast on marshaled protobufs, so you can do validation on input, right before unmarshaling
https://github.com/katydid/katydid
Currently only Go and Haskell are supported, but the testsuite is cross language.
I've managed to create a protoc plugin for this here. Currently it does not support oneof but I will add that in the coming days. I plan to finalise the proto API of specifying tags. Then I'll work on adding tests and internal restructure without effecting the proto API. Any thoughts on the following style,
message Example {
string with_new_tags = 1 [(tagger.tags) = "graphql:\"withNewTags,optional\"" ];
string with_new_multiple = 2 [(tagger.tags) = "graphql:\"withNewTags,optional\" xml:\"multi,omitempty\"" ];
string replace_default = 3 [(tagger.tags) = "json:\"replacePrevious\""] ;
}
It achieves this by parsing generated proto files using go/ast
package and then updating the struct tags and rewriting to the disk. The only downside is it has to run after protoc-gen-go has run.
UPDATE 1:
I've added support for oneof fields as well. Any suggestions on the API names are most welcome.
UPDATE 2:
Added support for adding tags to exported XXX* fields.
@meling You don't have to fork protobuf and keep updating it, you can just use this as an additional step in the generation process.
@dsnet Thank you for the official proposal! This really helps in the enterprise world a lot!
I am looking for this also.
这个功能太多人需要了,咋就不开发一个呢!!
一个api模型, 一个数据库模型, 一个api转数据库, 一个数据库转api ........
This PR #969 is somewhat related to this.
Did anything ever come of this @dsnet? Sorry for the at but I figured since it was your proposal it might be nice to surface :-)
I am sorry if this is a duplicate as I am not sure how I run into this project:
https://github.com/srikrsna/protoc-gen-gotag
It solved the problem for me. It might not be the correct solution, but the result is what I was looking for.
My workaround was to use sed
in gen.go
:
For example
//go:generate protoc --go_out=. myfile.proto
//go:generate sed -i "s/\\(MyField .*\"\\)`/\\1 datastore:\"noindex\"`/" myfile.pb.go
That turns
MyField string `protobuf:blah..."`
into
MyField string `protobuf:blah...." datastore:"noindex"`
Seriously though guys, @dsymonds please implement some support for this in protoc-gen-go
already.
It is open source project. One can make a PR, if really needs it.
It is open source project. One can make a PR, if really needs it.
There were a few attempts in helping to provide this in the form of PRs here or repo forks with comments in this and similar duplicate issues. It just does not seem to be interesting or important for the maintainers.
Most who needed the feature either rolled out their own tiny forks and scripts, or switched to gogo/protobuf.
solution N:
use sed in bash after protoc
sed 's/json:("[^"]+,omitempty")/json:\1 bson:\1/g'
json:""
tags are deprecated(?), so I wouldn’t rely upon them.
Thank you renld!
I just added -i and -E and got it working.
sed -i -E 's/json:("[^"]+,omitempty")/json:\1 bson:\1/' main.pb.go
@qianlnk I am having a problem making this work.... does it support proto3?
syntax = "proto3"; package contracts; import "Common.proto"; //import "google/protobuf/timestamp.proto"; message ZZZ { string id = 1; // `db:"id"` string clientId = 2; string datetime = 3; }
Converting with:
protoc -I=xyz.eu/contracts/proto --go_out=plugins=grpc+retag:xyz.eu/contracts/ xyz.eu/contracts/proto/*.proto
Result:
type ZZZ struct { Id string `protobuf:"bytes,1,opt,name=id" json:"id,omitempty"` ClientId string `protobuf:"bytes,2,opt,name=clientId" json:"clientId,omitempty"` Datetime string `protobuf:"bytes,5,opt,name=datetime" json:"datetime,omitempty"` }
The db custom tag on id is missing....
Like everyone using proto, I need this in order to work with (any) orm:)
Thanks for your time
That's not true. Gorm will automatically use the Id field as your primary key.
A key feature of protocol buffers is that they are a language-neutral, platform-neutral serialization mechanism. A data structure defined in a .proto
file can be passed between different implementations or programming languages without loss of information. We should not add features to the Go protobuf implementation which are not accessible to other implementations.
A way to interrogate whether a feature of Go protobufs is appropriate or not is to conduct a thought experiment: If we have a Go service that reads and writes protobuf messages using that feature, can we practically rewrite it in another language without changing the message definitions or affecting clients of the service?
Imagine, for example, a feature which disables UTF-8 validation of string fields. This would be useful for some Go users, since Go strings are not required to contain UTF-8 data. However, Java strings are; the Java protobuf implementation has no way to represent non-UTF-8 string data. We could not rewrite a Go service using non-UTF-8 strings in Java without changing the field definition (from string
to bytes
) or making changes to the Java protobuf implementation.
In contrast, imagine a feature which permits setting the Go type used for a protobuf bytes field to either string
or []byte
. This would only affect the internal representation of data in the Go code. If we rewrote the service in another language, the annotation would no longer be useful, but harmless.
So, do custom struct tags constrain interoperability or not? At first glance, it might seem that they do not. As in the case of using string
or []byte
to represent a bytes-valued field, the struct tag appears only in the Go code. If we consider the use cases for these tags, however, it is not that simple.
One use case is message validation: A tag will be set on some fields indicating what data it may contain, and some package will inspect those tags when checking a message for validity. There is no good way to write that validator in some other language, since it depends on Go-specific annotations. Fortunately, we have a better, portable alternative: We can use protobuf options to annotate fields and inspect those options using the protoreflect
package. (A version of this approach is described in "A new Go API for Protocol Buffers".) Not only will these annotations be accessible from other languages, using protoreflect will permit the validator to operate on other Go representations of protobuf messages, such as the one provided by the dynamicpb
package.
The other often-cited use case is to permit the use of alternative serialization implementations, such as the "encoding/xml"
package. It seems clear that this runs directly counter to the goal of protobufs as a language-neutral serialization mechanism: It is very unlikely that a message serialized using "encoding/xml"
and some set of custom field tags will be deserializable by any other implementation. Protocol buffers provide three official serialization formats--binary, JSON, and text--and great care is taken to ensure that data serialized by one implementation can be losslessly deserialized by any other. If additional serialization forms are desired (XML, BSON, etc.), then they should be defined as a general-purpose mapping to and from the protobuf data types and implemented in a generic fashion, probably using protobuf reflection as the JSON and text serializations do.
In general, however, if a user needs to represent a protobuf data structure in some encoding not supported by protobufs (e.g., XML), then it is usually best to first encode the protobuf to a supported form (binary, JSON, or text), and then wrap the serialized data in the desired format. Alternatively, this may point at a case where protobufs are not a good fit for the job.
In both the case of validation and custom serialization, using custom field tags prevents interoperability with other protobuf implementations, are an inferior alternative to custom field options accessed via protobuf reflection, or both.
For these reasons, we do not believe we should add support for custom field tags to the Go protobuf generator.
I will leave this issue open for a few more days for final comment.
I am not contesting neild's post.
How does the logic above apply to JSON support in the library? ;) Why JSON gets support and XML does not? Unsigned 64 bit in JSON is serialized as string. That's a very specific support for target format...
We have to understand why people want this feature in the 1st place. The #1 motivation is to be able to tag structures for use by another library like DB layer (https://github.com/upper/db is just one example).
IMO, Protocol Buffers team needs to focus on helping developers to make it easy to assign protocol buffer generated structures to types that have required tag data. Unfortunately, it is not possible today to do this:
type DBRecord struct {}
var record DBRecord = pbRecord
The problem is with additional fields that PB generates. Those extra fields make it impossible to do an assignment of variables and requires manual field by field copy. I know that, because I had to code a lot of functions to copy data from PB and into PB.
Specification is clear here:
It would be great if PB team could help us with copy data graph operations. I think there are a number of approaches that can be taken here.
Those extra fields make it impossible to do an assignment of variables and requires manual field by field copy.
While I understand the desire to do this, I don't see a way for this feature to be provided without going against protobuf's goal of being backwards and forwards compatible. In particular, the ability to add new fields without breaking current usages. In order for this assignment (or a cast) to work, it requires that the memory layout of DBRecord
be identical to the memory layout of the generated protobuf message type. Fundamentally, this adds a constraint that .proto
source file be atomically updated with the equivalent Go type definition, which goes against one of the primary goals of protobufs.
We can go on and on with this feature, but if authors are very insistent on "my way or highway", there is not much we can do here.
This is one of the most requested features desperately missed from the original proto-gen-go. Missed so much that people switched to gogoprotobuf just to have it. If implemented, it would allow proto-gen-go generated code to be useful in more than one scenario that original authors thought of, and just be a good "code neighbour" by allowing the feature that is not in direct use of the library, but potentially make life much easier to devs who intend to do "one more little thing" with generated code. Continuing with neighbour analogy, it also can choose to be an old sad grumpy "get off my lawn", and just never allow kids to draw on the sidewalk because "it is not what sidewalk is for".
After 5 years of begging through this bug being open... This is actually quite sad.
if authors are very insistent on "my way or highway", there is not much we can do here.
To my recollection we have never said "my way or highway". We acknowledge the benefit of this feature, but we have always pointed at technical reasons as to the detriments of this feature. Fundamentally, we prioritize adherence to the overall protobuf ecosystem and this feature goes against that priority. We understand that this is not the preference of some users, but engineering is about evaluating different options, understanding the benefits and detriments of all options and making a choice.
After 5 years of begging through this bug being open... This is actually quite sad.
This statement makes it sound like we don't care about users. I should note that the fact that we spend significant amount of time engaging with people (even if we may disagree on technical points) should hopefully show that we care.
I can go back starting from a very first response on this thread, and continue to comb this (or other) thread(s) to support of what I have said, but it is not a productive use of my or your time.
You reopened this thread to gather users opinion? I think it is quite clear by now that we (users) wholeheartedly voted for it. I think there is quite an obvious way to "show that we care" other than continue to discuss it.
Thank you for your time.
Thank you @dsnet. You've done a great job supporting the project over the years.
As a community we aim to provide feedback on real world implementations and sought-after features. I don't have much to add outside echoing @kulak where in most/all projects I've found myself copying data from/to PB.
There is a natural flow into the DB layer and the Go ecosystem makes heavy use of tags within that layer. Capturing this in a single generated object would be a huge boost in productivity for the Go community.
I may not agree with your final stand, but respect the adherence to the overall protobuf ecosystem. Keep up the great job.
I understand the engineering decision, but it is nonetheless sad. Some projects don’t care about compatibility with other languages. And doing manual copying between structs is error prone, and boring... I’m not hopeful that someone will pickup and fix gogo, so that it becomes compatible with this library, at least not in a releasable form in the next five years... unless someone put a full time team on it for the next year or so... it will probably be easier to start from scratch.
But I guess we can just hack this ourselves... I might do that when I get some spare time. Not likely to happen soon, so if anyone beats me to it, let me know.
Candidly, we’ve concluded the cost of abandoning protos at the DB layer is lower than continuing to work with them due to lack of struct tags.
I would summarize the maintainers’ stance as the Protobuf ecosystem is more important than the language ecosystem(s) that it works within.
Given that most of us work primarily with language $x (in this case Go) more than we do with protos, this is petty sad.
We should not add features to the Go protobuf implementation which are not accessible to other implementations.
You should absolutely add language-specific features which are not accessible to other implementations if it means that language’s implementation conforms to the norms and expectations of that language. Every other Protobuf implementation does. Why is Go excluded?
A way to interrogate whether a feature of Go protobufs is appropriate or not is to conduct a thought experiment: If we have a Go service that reads and writes protobuf messages using that feature, can we practically rewrite it in another language without changing the message definitions or affecting clients of the service?
Folks use Protobufs to define schemas used for other purposes than talking to a service. For example—WhatsApp uses it to serialize encrypted messages in a SQLite blob field. Square (and others) it to serialize complex data structures to MySQL.
Imagine, for example, a feature which disables UTF-8 validation of string fields. This would be useful for some Go users, since Go strings are not required to contain UTF-8 data. However, Java strings are; the Java protobuf implementation has no way to represent non-UTF-8 string data. We could not rewrite a Go service using non-UTF-8 strings in Java without changing the field definition (from
string
tobytes
) or making changes to the Java protobuf implementation.
You mean like this Java-specific feature?
Regardless, your example (UTF-8 validation) is specious. Struct tags or the Go field names (see #555) only affect the Go implementation, and do not affect the interoperability of the protobuf messages.
Do you plan to remove json_name
and JSType
?
One use case is message validation: A tag will be set on some fields indicating what data it may contain, and some package will inspect those tags when checking a message for validity. There is no good way to write that validator in some other language, since it depends on Go-specific annotations.
Sure, struct tags can be used (or abused) for purposes outside of serialization. Just because there’s a footgun doesn’t mean folks are going to use it. Should you not trust your users, instead of trying to shield them from vague hypotheticals?
In both the case of validation and custom serialization, using custom field tags prevents interoperability with other protobuf implementations, are an inferior alternative to custom field options accessed via protobuf reflection, or both.
Hah, nope.
You mean like this Java-specific feature?
I should have been clearer that I was referring to validation of proto3
string fields, which that setting does not, to my knowledge, affect.
Do you plan to remove json_name and JSType?
The json_name
option is language-neutral. It affects the JSON serialization of a field, in much the same way field numbers affect the binary serialization.
The JSType
option is identical to the string
/[]byte
hypothetical I gave as an example of something that does not impact interoperability. (I do want to add the ability to select between those two types for the Go representation of proto string
and bytes
fields; it just hasn't made it to the top of the priority list yet.)
The
JSType
option is identical to thestring
/[]byte
hypothetical I gave as an example of something that does not impact interoperability. (I do want to add the ability to select between those two types for the Go representation of protostring
andbytes
fields; it just hasn't made it to the top of the priority list yet.)
JSType
is really close to what the OP for this issue describes.
An option to select between []byte
and string
representation sounds terrific. What about extending this to well-known-types, like using *string
for StringValue
, or time.Time
for Timestamp
?
Candidly, we’ve concluded the cost of abandoning protos at the DB layer is lower than continuing to work with them due to lack of struct tags.
@ydnar , we are about to face the same dilemma, and I am curious what did you choose as a replacement?
Candidly, we’ve concluded the cost of abandoning protos at the DB layer is lower than continuing to work with them due to lack of struct tags.
@ydnar , we are about to face the same dilemma, and I am curious what did you choose as a replacement?
We took the generated structs from protoc, deleted the protobuf-specific code, leaving struct tags as necessary. We already had a translation later between our db protos and API protos, so that layer didn’t need much additional work.
I guess the underlying problem is that protobuf messages are considered to be DTOs, and not primary entities.
While this is a common pattern in other languages, it is diametral to Go, imho.
Go walks quite a mile to have your primary entities to be used as DTOs as well -- one of its beauties. But I guess if one wants to use protobuf messages, it seems a DT layer is mandatory, as well as adding a translation of DTOs to primary entities and vice versa.
One thing I can not wrap my head around, however, is that we are talking of protoc-gen-go: This is language specific, so it would not hurt other languages to have language specific features in the code generation -- after all, the according markup could be ignored in other languages by default.
I'm going to re-open this issue, but it does not mean that we're going to support this. I'd like to see more discussion on the feasibility of this. Some thoughts:
- I can see the usefulness of this feature for Go-heavy projects, but protobufs were designed a language-agnostic way to interchange data. Baking more Go-specific details in the proto files seems to be antithetical to the philosophy of protobufs.
- This features makes the assumptions that generated Go protobufs are always structs with fields. What does it mean if we want the ability to generate opaque protobuf types that are interacted via methods. Performance testing has shown that techniques like lazy unmarshaling can bring significant speed benefits, but is only feasible as an opaque type.
- The proper way to support this seems to be using protobuf options, but I don't see any proposal here suggesting a syntax for how that would work.
this is an old issue but I still see tags in pb.go files so maybe a sane approach at a .proto3 level could be to add a CRD generator (protoc-gen-crd plugin) and then allow to use it as a base for stuff like knative objects used by controllers or maybe php ORM yml custom schemas.
I imagine there should be another issue for this discussion, and I am happy to open one, but this seemed like the place for the relevant discussion. That's especially true given the locking of #1142.
I'm sort of stumped by the outcome of #1142 given the language from @neild regarding type control. I second the question in #52 (comment) though do understand that this is something of a slippery slope and may lead towards protocol buffer files which are too closely tied to an implementation..
What I'm really stuck on is nullability. I don't follow the principled argument against adding options to interact with the behavior of protoc-gen-go
w.r.t. to nullability. Running back through the though experiment, it seems that such an option does not violate the language-switching principle.
For what it's worth, the performance wins of avoiding allocations has lead cockroachdb/cockroach to use gogo/protobuf despite the fact that it is not maintained and will not obviously ever have support for the new library which is so much better. I suppose adding something like object pooling via something like what maybe is being discussed in #1192 could be helpful but tracking object lifetimes is quite hard.
It feels like this has been a prolonged philosophical debate but I feel like the philosophy got lost w/ struct tags, which I can get behind. That argument however has extended to cover other more agnostic portions of the generated code. Value types are a major feature of go to provide not just access locality but to dramatically reduce the number of allocated objects making it more reasonable for performance sensitive code despite having a garbage collector. The lack of ability to utilize this key property of the language without forking the code generator feels confusing and at odds with a commitment to support go as a first-class user of protocol buffers.
FWIW I do feel that controlling the type of the generated structs for messages is valuable but I don't have a clear vision on how to expose that protoc
in a clean way just yet.
I've submitted #1225 as a concrete and focused issue related to the above comment which was off-topic for this issue.
I'd like to add that this feature would be extremely useful for validation purposes.
I have written a tool named protoc-gen-go-tag to this feature, custom struct tags for protobuf!
This seems to be nice, as it's implemented by protobuf and golang's AST.
- install:
go get -u github.com/searKing/golang/tools/cmd/protoc-gen-go-tag
- compile:
protoc -I . --go-tag_out=paths=source_relative:. *.proto
- examplse:
https://github.com/searKing/golang/tree/master/tools/cmd/protoc-gen-go-tag/examples
- source code: https://github.com/searKing/golang/tree/master/tools/cmd/protoc-gen-go-tag
So, is there any news about this customization ability for protoc-gen?
tested https://github.com/srikrsna/protoc-gen-gotag
it works with buf
buf.gen.yaml
version: v1beta1
plugins:
- name: go
out: .
opt: paths=source_relative
- name: gotag
out: .
opt: paths=source_relative
- name: go-grpc
out: .
opt: paths=source_relative,require_unimplemented_servers=true
- name: validate
out: .
opt: lang=go,paths=source_relative
https://github.com/gogo/protobuf
a nice alternative to fuck this project.
Need this feature,A lot of request and response are not standard json lowercase ,so we need this feature to custom json tag.