TheThingsNetwork/lorawan-stack

Migrate away from gogo/protobuf

rvolosatovs opened this issue ยท 69 comments

Summary

https://github.com/gogo/protobuf is not mantained any more gogo/protobuf#691 (currently)

Why do we need this?

It's our dependency, which is incompatible with new golang/protobuf version, which more and more packages depend on, hence we need to replace the golang/protobuf version, depending on outdated versions of our direct dependencies and potentially even breaking packages this way

What is already there? What do you see now?

gogo/protobuf dependency

What is missing? What do you want to see?

Figure this out

How do you propose to implement this?

Figure out if a new maintainer will appear or different plugin with feature parity?
Use just vanilla protobuf?

How do you propose to test this?

tests

Can you do this yourself and submit a Pull Request?

yes

Validation plugin we're using dropped support for GoGo
bufbuild/protoc-gen-validate#340

I think the best way forward is to follow the ecosystem and migrate away from gogo/protobuf. With more and more of our other dependencies moving away from gogo, I think it will become increasingly difficult to keep using it. Of course it's going to be a lot of work to migrate, so if we do it, we need to come up with a good plan.

@rvolosatovs probably knows more about the custom options that are set in our gogottn generator, but here's what I found for the explicit options in our proto files:

  • We could start by removing the gogoproto.customname, gogoproto.stdtime and gogoproto.stdduration, and goproto_enum_prefix options. Those are relatively easy to remove, since the Go compiler will immediately complain about any resulting issues.
  • Removing gogoproto.embed option would mean that we can no longer access embedded fields (the Go compiler will help us find those), and that messages no longer satisfy some interfaces (this may be more difficult).
  • The gogoproto.nullable option will be much more work, because we will have to start using the getters, and add nil-checks. Resulting problems may not get caught by the Go compiler. Possible workaround would be to temporarily make those fields private, then rewrite to getters/setters and finally making the fields public again.
  • What's going to be quite problematic are the fields that use gogoproto.customtype and enums that use gogoproto.enum_stringer options. For those we've often changed the way they're marshaled/unmarshaled to JSON. For the custom bytes fields such as EUI, DevAddr etc. we could change the type (in the proto messages) to string (which is binary compatible). With the enums I'm afraid it's going to be breaking the JSON API, since those are now accepted (by UnmarshalJSON) as both strings and ints.

Maybe this is also a good time to start thinking about our v4 API, because I can imagine we might discover some more (API breaking) surprises.

I think the best way forward would be to first try out https://github.com/alta/protopatch. Depending on the result:

  • If all our needs are covered -> migrate and forget about this(should be just search-and-replace in api directory)
  • If only some of our needs are covered and there are custom options not covered by the plugin -> we should evaluate on per-option basis and either remove these custom options or, perhaps, contribute to the protopatch if it's a low-effort feature. This really depends on the option, though - if we're talking about customtype - that IMO definitely justifies contributing, but maybe something like stdtime - not so much.

Looking forward, I don't think we should be directly using vanilla protobuf protos in components internally at runtime(given the provided feature set of protobuf today).
It only makes sense to use protobuf for (de-)serialization, so for storage and on API layer. Internally, however, using plain vanilla generated Go protos makes no sense to me.
So, for example, NS:

  1. get *ttnpb.EndDevice (vanilla generated Go type) from the registry, deserialized from stored binary data
  2. convert *ttnpb.EndDevice into T_device, (NOTE: perhaps that could be just a wrapper initially or forever)
  3. use T_device internally in NS
  4. convert T_device into *ttnpb.EndDevice (NOTE: this could be a trivial, very fast task if we're using a wrapper, since we only need to modify changed fields and that could even be performed on binary data directly)
  5. set *ttnpb.EndDevice, serialize into binary data

Refs also #342 (generated populators)

I'm not in favor of a (smaller) alternative to gogo. It feels like pushing the can. Let's keep things as vanilla as possible, especially when we need to decide again what the best way forward is.

I do agree that we can consider using intermediary types in some places, instead of relying everywhere on the generated protos. It's basically separating data transfer objects (DTOs: protos, also for storage) from data access objects (DAOs: how we use them). If that is primarily reading, we can also declare interfaces and see how far we get with that.

That said, I wouldn't go as far as changing the entire NS to using T_device, but rather specific structs and/or interfaces as needed.

Let's move this discussion to November

@rvolosatovs what is your objection against moving to vanilla with a custom JSON marshaler?

The huge migration burden and loads of boilerplate if we end up just using vanilla protos directly.
I don't really object to that though, I just think we should try to find a simple non-intrusive alternative first and if that's not possible, then resort to reworking this whole thing.

I'm afraid that any plugin that we start to rely on will end up in an unmaintained state at some point. Generally speaking, I'm in favor of keeping things as close to vanilla as possible. If that means nil checking more often than we like, then so be it. It can also work in our favor that we know that things are not set, instead of an initialized struct.

I'm afraid that refactoring our entire codebase is going to be a pain no matter how we do it. Our (gogo-generated) proto structs are used everywhere right now (gRPC API, HTTP API, events, errors, internally, Redis DB, ...), so changing to something else (whatever that other thing is) is going to touch pretty much everything in our codebase, and the way it looks now, all at the same time.

The hard requirement is that we don't break compatibility of our v3 API. Even if we decide to use this situation as the moment to start working on a v4 API (at least internally), we will still have to keep supporting that v3 API for existing users.

In the long term, I think we would do ourselves a big favor by decoupling our (versioned, stable-within-major) external APIs from our internal (unversioned, stable-within-minor) API and our (versioned, stable) database documents. We could then write or generate functions to convert between our internal APIs and the others.

But I think there are some steps that we can already take now:

JSON

In order to keep our v3 JSON API compatible, I think our first TODO is to work on generating JSON marshalers and unmarshalers that understand how to marshal/unmarshal our custom types. I think doing this is smart anyway because there is no stability promise for Go's implementation of the JSON format for protocol buffers, so we better have control over that ourselves. Doing this could also allow us to consider field masks when marshaling to JSON. In the grpc-gateway runtime we can register codecs, so we can just write a codec that calls our own (generated) (un)marshalers instead of {gogo,golang}/protobuf's jsonpb.

Generate new protos next to the old ones

I already tried that here: a41f62d

This does make protobuf complain about the types registry, so we may need to remove golang_proto.RegisterType from our old protos to make this work. Removing that could potentially break resolving of google.protobuf.Any, but we only use those in errors and events, so we can probably find workarounds for those specific cases.

Generate functions to convert between old and new protos

This is for the transition period only, but for the long term solution, we'd want to generate similar converters.

Update services one by one

I already tried that with a simple service here: cd7d75c, but for more complicated services we'd definitely need those converters.

Note that this only changes the grpc service itself. The grpc-gateway still uses the old gogo stuff on the JSON side, and then calls the internal gRPC server, which then runs the new implementation.

Pushed some initial dependency updates and backwards compatibility work-arounds here: https://github.com/TheThingsNetwork/lorawan-stack/compare/issue/2798-codec

More and more of our dependencies are upgrading to protobuf 1.4 and the V2 API, and the longer we keep this open, the more problems we'll have when trying to upgrade our dependencies.

We should really give this some more priority and make a decision on what we're going to do about all this.

Please plan a call for next week so we can discuss offline.

I think we should go through this pain process and concentrate on solving this in a week or two. And to avoid that we do other things as this is going to cause lots of conflicts otherwise. Having as many hands as possible requires knowing exactly what we are going to do in which cases, dividing tasks as much as possible and keeping eyes on the prize.

Next steps:

  1. @rvolosatovs updates the tooling to be as close as possible to "vanilla":
  • Remove things like unconvert, gofumpt and whatever else we're doing on top of protoc
  • Switch from protoc-gen-gogottn to protoc-gen-gofast (or whatever is closests to vanilla)
  • Explicitly add (gogoproto.*) options in our proto files, so that they render the same as now
  1. We need to refactor our code to use Getters instead of direct field access. Perhaps tools like gopls and rf can help with this.
  2. We start removing (gogoproto.*) options one by one and update code that uses it. Perhaps tools like gopls and rf can help with this.
  • Removing gogoproto.populate and updating tests (#342)
  • Removing gogoproto.customname and changing EUI -> Eui etc.
  • Removing gogoproto.embed. We do need to make sure that messages still implement interfaces such as ValidateContext(context.Context) error and ExtractRequestFields(m map[string]interface{}).
  • Removing gogoproto.nullable and making sure we use Getters where possible, and do nil checks otherwise.
  • ...

@rvolosatovs let's try to make those first steps for v3.11.3. When that's done please re-add the other assignees, and let's discuss again.

I think it's time for the entire team to start working this, because I really can't (and don't want to) do this alone. It's very time consuming work, and often frustrating, but it needs to be done.

Before my vacation I've spent a lot of time on generating JSON (un)marshalers, which should let us remove the gogoproto.customtypes and hand-written JSON (un)marshalers.

Unfortunately, we still have quite some manual work to do for removing gogoproto.nullable and gogoproto.embed, and I'm afraid this will be a matter of removing hundreds of those options one by one, and making sure that the code that uses those fields doesn't break.

Things to look at:

  • Does the message still implement the interfaces that it did? This isn't caught by the compiler. As an example: many request messages currently implement ttnpb.IDStringer, which is used by rate limiting. Removing gogoproto.embed=true means that a request message may no longer implements that interface, which would break rate limiting. There are probably more of such interfaces, how can we catch those?
  • Do we not introduce nil pointer dereferences? When removing gogoproto.embed=true and/or gogoproto.nullable=false it's possible that some reads/writes to fields are not possible anymore, even though the compiler is perfectly happy.
  • What else?

I plan to remove some gogoproto.embed and gogoproto.nullable options from user.proto today.

I'll take up gogoproto.embed from deviceclaimingserver.proto today.

In the next pass, I'll take up gateway_services.proto and gateway.proto; the latter being pure drudgery.

I plan to remove some gogoproto.embed and gogoproto.nullable options from organization.proto today.

Will do the same for client.proto.

And today for application.proto.

@KrishnaIyer I'll do the GatewayIdentifiers in gateway.proto, because I've been doing the same for other identifiers in the last couple of days.

So for @adriansmares @bafonins @johanstokking @kschiffer @pgalic96:

  • In this round we are removing gogoproto.embed and gogoproto.nullable
  • We leave fields with gogoproto.customtype alone
  • Search in the repository to see which .proto files still have these options
  • Claim the file by posting a comment here
  • Remove the options, fix the rest of the code and submit a PR (see also #2798 (comment))

In the next pass, I'll pick up joinserver.proto and metadata.proto.

@KrishnaIyer I'd like to take over joinserver.proto as I'm working on a refactor (#4677).

Ok sure ๐Ÿ‘

Picking up the applicationserver_*.proto files.

I'm working on the remaining gogoproto.embed options in application.proto, client.proto, gateway.proto, organization.proto and user.proto today.

The following files still have gogoproto.embed options left:

  • api/applicationserver_web.proto
  • api/end_device.proto
  • api/identifiers.proto
  • api/identityserver.proto
  • api/join.proto
  • api/lorawan.proto
  • api/message_services.proto
  • api/messages.proto
  • api/metadata.proto
  • api/oauth.proto
  • api/rights.proto
  • api/search_services.proto

And these files (also) have gogoproto.nullable options that need to be removed:

  • api/application.proto
  • api/client.proto
  • api/deviceclaimingserver.proto
  • api/end_device.proto
  • api/events.proto
  • api/gateway.proto
  • api/gatewayserver.proto
  • api/identifiers.proto
  • api/identityserver.proto
  • api/join.proto
  • api/joinserver.proto
  • api/lorawan.proto
  • api/message_services.proto
  • api/messages.proto
  • api/metadata.proto
  • api/oauth.proto
  • api/organization.proto
  • api/qrcodegenerator.proto
  • api/regional.proto
  • api/rights.proto
  • api/search_services.proto
  • api/user.proto

Everyone please claim some more files!

As a reminder:

  • In this round we are removing gogoproto.embed and gogoproto.nullable
  • We leave fields with gogoproto.customtype alone
  • Remove the options, fix the rest of the code and submit a PR (see also #2798 (comment))

I'm picking up the gogoproto.embed options in identityserver.proto, oauth.proto, rights.proto and search_services.proto today.

Taking api/message_services.proto

Edit: This API in unused, will pickup events.proto and gateway.proto.

Today I'm going to remove gogoproto.nullable from timestamps in application.proto, client.proto, oauth.proto, organization.proto, rights.proto and user.proto.

Picking up embed/nullable in lorawan.proto for gateway related entities, will add the rest later.

I'm picking up api/messages.proto today. This one is a PITA.

Today I'll be working on getting rid of (gogoproto.stdtime) and (gogoproto.stdduration) options.

Today I'll pick up gatewayserver.proto and join.proto.

Today I'll pick up deviceclaimingserver.proto, lorawan.proto and qrcodegenerator.proto

Turns out that deviceclaimingserver.proto and joinserver.proto are already not using nullable (when not using customtype), embed, stdtime nor stdduration. I'll pick up regional.proto instead.

I'm also doing messages.proto as it's quite related

I'm picking up metadata.go and end_device.proto

I'll pick up api/identifiers.proto tomorrow since that's related to #4896.
This is one of our most used messages so this is going to be fun.

Here's what's left for the field options (still omitting fields with customtype):

  • api/identityserver.proto
EntityIdentifiers entity_ids = 2 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
  • api/tti/external_user.proto
google.protobuf.Timestamp created_at = 3 [(gogoproto.stdtime) = true];
google.protobuf.Timestamp updated_at = 4 [(gogoproto.stdtime) = true];
  • api/end_device.proto
MessagePayloadFormatters default_formatters = 13 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
google.protobuf.Duration class_b_timeout = 1 [(gogoproto.stdduration) = true];
google.protobuf.Duration class_c_timeout = 5 [(gogoproto.stdduration) = true];
google.protobuf.Duration status_time_periodicity = 16 [(gogoproto.stdduration) = true];
google.protobuf.Duration class_b_c_downlink_interval = 31 [(gogoproto.stdduration) = true];
MACParameters current_parameters = 1 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
MACParameters desired_parameters = 2 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
DLSettings downlink_settings = 6 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
JoinRequest request = 2 [(gogoproto.nullable) = false];
SessionKeys keys = 3 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
EndDevice end_device = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
EndDevice end_device = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
EndDevice end_device = 1 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
EndDevice end_device = 1 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
  • api/tti/configuration.proto
google.protobuf.Duration token_ttl = 2 [(gogoproto.stdduration) = true];
google.protobuf.Duration deduplication_window = 2 [(gogoproto.stdduration) = true];
google.protobuf.Duration cooldown_window = 3 [(gogoproto.stdduration) = true];
  • api/tti/entity_api_key.proto
ttn.lorawan.v3.EntityIdentifiers entity_ids = 1 [(gogoproto.nullable) = false];
  • api/tti/authentication_provider.proto
google.protobuf.Timestamp created_at = 2 [(gogoproto.stdtime) = true];
google.protobuf.Timestamp updated_at = 3 [(gogoproto.stdtime) = true];
  • api/tti/license.proto
LicenseIdentifiers id = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
LicenseIssuerIdentifiers license_issuer_ids = 2 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
google.protobuf.Timestamp created_at = 3 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];
google.protobuf.Timestamp valid_from = 4 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];
google.protobuf.Timestamp valid_until = 5 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];
google.protobuf.Duration warn_for = 6 [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
google.protobuf.Duration limit_for = 7 [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
google.protobuf.Duration extend_valid_until = 1 [(gogoproto.stdduration) = true];
google.protobuf.Duration interval = 1 [(gogoproto.stdduration) = true, (gogoproto.nullable) = false];
TenantIdentifiers tenant_id = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false];

And here's the files that still have enum options:

  • api/rights.proto
  option (gogoproto.goproto_enum_prefix) = false;
  • api/enums.proto
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  • api/metadata.proto
  option (gogoproto.goproto_enum_prefix) = false;
  • api/lorawan.proto
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.enum_stringer) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.enum_stringer) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.enum_stringer) = false;
  option (gogoproto.enum_stringer) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.enum_stringer) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  • api/contact_info.proto
  option (gogoproto.goproto_enum_prefix) = false;
  option (gogoproto.goproto_enum_prefix) = false;
  • api/client.proto
  option (gogoproto.goproto_enum_prefix) = false;

What's the trick for getting rid of the enum options? What is the implication for JSON API compatibility?

I don't know yet, but I plan to look into that this week (unless someone else wants to). Other assignees can for the time being focus on the remaining nullable, embed, stdtime, stdduration options in the comment above.

All JSON API is now handled by our generated (un)marshalers and the (thethings.json.*) options in our .proto files.

@pgalic96 has taken over the work on the CLI flag handling, so that part should also be handled soon(ish).

I've started removing the first gogoproto.customtype options here: https://github.com/TheThingsIndustries/lorawan-stack/compare/issue/gogoproto-customtype. But until @pgalic96 is done with the CLI flag handling, we can only remove gogoproto.customtype options on response messages (or for anything that isn't used in the CLI).

Here's the latest list of what's left to do in this round. I left out (gogoproto.goproto_registration) (because we need that until we switch libraries), the various enum_prefix options (I'm currently working on that) and (gogoproto.customtype) (because that comes in the next round when @pgalic96 is done with the CLI flags stuff).

./api/_api.proto:28:option (gogoproto.sizer_all) = true;
./api/end_device.proto:276:  MACParameters current_parameters = 1 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/end_device.proto:278:  MACParameters desired_parameters = 2 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/end_device.proto:581:  EndDevice end_device = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/end_device.proto:585:  EndDevice end_device = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/end_device.proto:620:  EndDevice end_device = 1 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/end_device.proto:634:  EndDevice end_device = 1 [(gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/tti/license.proto:24:  LicenseIdentifiers id = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/tti/license.proto:27:  LicenseIssuerIdentifiers license_issuer_ids = 2 [(gogoproto.embed) = true, (gogoproto.nullable) = false, (validate.rules).message.required = true];
./api/tti/license.proto:29:  google.protobuf.Timestamp created_at = 3 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];
./api/tti/license.proto:31:  google.protobuf.Timestamp valid_from = 4 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];
./api/tti/license.proto:33:  google.protobuf.Timestamp valid_until = 5 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true];
./api/tti/license.proto:36:  google.protobuf.Duration warn_for = 6 [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
./api/tti/license.proto:38:  google.protobuf.Duration limit_for = 7 [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
./api/tti/license.proto:78:  google.protobuf.Duration extend_valid_until = 1 [(gogoproto.stdduration) = true];
./api/tti/license.proto:83:  google.protobuf.Duration interval = 1 [(gogoproto.stdduration) = true, (gogoproto.nullable) = false];
./api/tti/license.proto:119:    TenantIdentifiers tenant_id = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false];

Just went through the codebase, and it looks like we only have a few more options left:

  • gogoproto.goproto_registration: We need to keep this around until we've fully migrated away, otherwise google.protobuf.Any messages will stop working.
  • gogoproto.customtype (some combined with gogoproto.nullable): With the latest changes to protoc-gen-go-json, I think we can start removing the ones that aren't used by the CLI. For messages that are used by the CLI, we have to wait until protoc-gen-go-flags is ready.

Today I'll continue working on removing some gogoproto.customtype options (#2798 (comment)).

Here's the list of protos with customtype fields that we'll have to remove.

I think the safest migration path is to be "better safe than sorry" and replace the customtype option with:

  1. Validation option:
    (validate.rules).bytes = { len: 8, ignore_empty: true }
    
  2. JSON option (even if we currently don't marshal to/from JSON):
    (thethings.json.field) = {
      marshaler_func: "go.thethings.network/lorawan-stack/v3/pkg/types.MarshalHEXBytes",
      unmarshaler_func: "go.thethings.network/lorawan-stack/v3/pkg/types.Unmarshal8Bytes"
    }
    
  3. Flag option (even if we currently don't have flags for the field):
    (thethings.flags.field) = {
      set_flag_new_func: "go.thethings.network/lorawan-stack/v3/pkg/types.New8BytesFlag",
      set_flag_getter_func: "go.thethings.network/lorawan-stack/v3/pkg/types.GetBytesFromFlag",
    }
    

I already checked off the ones I started with in https://github.com/TheThingsIndustries/lorawan-stack/compare/issue/gogoproto-customtype

api/end_device.proto:

  • Session: bytes dev_addr = 2
  • MACState_JoinAccept: bytes dev_addr = 5
  • MACState_JoinAccept: bytes net_id = 6
  • EndDevice: bytes net_id = 23
  • DevAddrPrefix: bytes dev_addr = 1
  • GetEndDeviceIdentifiersForEUIsRequest: bytes join_eui = 1
  • GetEndDeviceIdentifiersForEUIsRequest: bytes dev_eui = 2

api/gateway.proto:

  • GetGatewayIdentifiersForEUIRequest: bytes eui = 1

api/tti/tenant.proto:

  • GetTenantIdentifiersForEndDeviceEUIsRequest: bytes join_eui = 1
  • GetTenantIdentifiersForEndDeviceEUIsRequest: bytes dev_eui = 2
  • GetTenantIdentifiersForGatewayEUIRequest: bytes eui = 1

api/tti/configuration.proto:

  • Configuration_Cluster_NetworkServer: repeated bytes dev_addr_prefixes = 1
  • Configuration_Cluster_JoinServer: repeated bytes join_eui_prefixes = 1

api/tti/license.proto:

  • License: repeated bytes dev_addr_prefixes = 12
  • License: repeated bytes join_eui_prefixes = 13

api/join.proto:

  • JoinRequest: bytes dev_addr = 3
  • JoinRequest: bytes net_id = 5

api/joinserver.proto:

  • SessionKeyRequest: bytes dev_eui = 2
  • SessionKeyRequest: bytes join_eui = 3
  • JoinAcceptMICRequest: bytes dev_nonce = 3
  • DeriveSessionKeysRequest: bytes join_nonce = 3
  • DeriveSessionKeysRequest: bytes dev_nonce = 4
  • DeriveSessionKeysRequest: bytes net_id = 5
  • ProvisionEndDevicesRequest_IdentifiersList: bytes join_eui = 1
  • ProvisionEndDevicesRequest_IdentifiersRange: bytes join_eui = 1
  • ProvisionEndDevicesRequest_IdentifiersRange: bytes start_dev_eui = 2
  • ProvisionEndDevicesRequest_IdentifiersFromData: bytes join_eui = 1
  • ApplicationActivationSettings: bytes home_net_id = 3
  • JoinEUIPrefix: bytes join_eui = 1
  • GetDefaultJoinEUIResponse: bytes join_eui = 1

api/messages.proto:

  • DownlinkQueueOperationErrorDetails: bytes dev_addr = 1
  • DownlinkQueueOperationErrorDetails: bytes pending_dev_addr = 4

api/keys.proto:

  • KeyEnvelope: bytes key = 1

api/networkserver.proto:

  • GenerateDevAddrResponse: bytes dev_addr = 1

api/metadata.proto:

  • PacketBrokerMetadata: bytes forwarder_net_id = 2
  • PacketBrokerMetadata: bytes forwarder_gateway_eui = 9
  • PacketBrokerMetadata: bytes home_network_net_id = 5

api/lorawan.proto:

  • FHDR: bytes dev_addr = 1
  • JoinRequestPayload: bytes join_eui = 1
  • JoinRequestPayload: bytes dev_eui = 2
  • JoinRequestPayload: bytes dev_nonce = 3
  • RejoinRequestPayload: bytes net_id = 2
  • RejoinRequestPayload: bytes join_eui = 3
  • RejoinRequestPayload: bytes dev_eui = 4
  • JoinAcceptPayload: bytes join_nonce = 2
  • JoinAcceptPayload: bytes net_id = 3
  • JoinAcceptPayload: bytes dev_addr = 4

api/application.proto:

  • IssueDevEUIResponse: bytes dev_eui = 1

api/packetbrokeragent.proto:

  • PacketBrokerGateway_GatewayIdentifiers: bytes eui = 2

api/identifiers.proto:

  • EndDeviceIdentifiers: bytes dev_eui = 4
  • EndDeviceIdentifiers: bytes join_eui = 5
  • EndDeviceIdentifiers: bytes dev_addr = 6
  • GatewayIdentifiers: bytes eui = 2
  • NetworkIdentifiers: bytes net_id = 1

api/deviceclaimingserver.proto:

  • ClaimEndDeviceRequest_AuthenticatedIdentifiers: bytes join_eui = 1
  • ClaimEndDeviceRequest_AuthenticatedIdentifiers: bytes dev_eui = 2
  • ClaimEndDeviceRequest: bytes target_net_id = 13
  • ClaimGatewayRequest_AuthenticatedIdentifiers: bytes gateway_eui = 1

Can we already branch off that particular branch, or should we wait until it is reviewed and merged ?

I did plan to fixup and rebase that branch when protoc-gen-go-flags is finished enough to make the CLI work, so let's be careful with that.

But if we coordinate who works on the branch on which day, then that should be okay too.

Here's an overview of the files that still need gogoproto.customtype removed:

  • api/deviceclaimingserver.proto
34:    bytes join_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
36:    bytes dev_eui = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
70:  bytes target_net_id = 13 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
84:  bytes join_eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
88:  bytes join_eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
95:  bytes home_net_id = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
96:  bytes home_ns_id = 3 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
186:    bytes gateway_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
  • api/end_device.proto
398:    bytes dev_addr = 5 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
399:    bytes net_id = 6 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
576:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID",
682:  bytes dev_addr = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
716:  bytes join_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
717:  bytes dev_eui = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
  • api/gateway.proto
219:  bytes eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
  • api/identifiers.proto
44:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64",
52:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64",
60:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr",
74:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64",
144:  bytes net_id = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
  • api/keys.proto
32:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.AES128Key",
  • api/lorawan.proto
164:  bytes dev_addr = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
179:  bytes join_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
180:  bytes dev_eui = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
181:  bytes dev_nonce = 3 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevNonce"];
203:  bytes net_id = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
204:  bytes join_eui = 3 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
205:  bytes dev_eui = 4 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
211:  bytes join_nonce = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.JoinNonce"];
212:  bytes net_id = 3 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
213:  bytes dev_addr = 4 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
  • api/messages.proto
344:  bytes dev_addr = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
348:  bytes pending_dev_addr = 4 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
  • api/metadata.proto
127:  bytes forwarder_net_id = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID", (gogoproto.nullable) = false];
133:  bytes forwarder_gateway_eui = 9 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
137:  bytes home_network_net_id = 5 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID", (gogoproto.nullable) = false];
  • api/networkserver.proto
36:  bytes dev_addr = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
  • api/packetbrokeragent.proto
55:    bytes eui = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
  • api/tti/configuration.proto
113:        (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddrPrefix",
130:        (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64Prefix",
  • api/tti/license.proto
52:  repeated bytes dev_addr_prefixes = 12 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddrPrefix", (gogoproto.nullable) = false];
54:  repeated bytes join_eui_prefixes = 13 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64Prefix", (gogoproto.nullable) = false];
  • api/tti/tenant.proto
110:  bytes join_eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64", (gogoproto.nullable) = false];
111:  bytes dev_eui = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64", (gogoproto.nullable) = false];
115:  bytes eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64", (gogoproto.nullable) = false];

Copied instructions from #5439:

  1. Preparation: make tools/bin/mage

  2. Find a field with gogoproto.customtype in our .proto files

  3. Remove the gogoproto.nullable option and replace the gogoproto.customtype option with:

(validate.rules).bytes = { len: 4, ignore_empty: true },
(thethings.json.field) = {
  marshaler_func: "go.thethings.network/lorawan-stack/v3/pkg/types.MarshalHEXBytes",
  unmarshaler_func: "go.thethings.network/lorawan-stack/v3/pkg/types.Unmarshal4Bytes"
}
  1. If the message has a (thethings.flags.message) option, also add flag options:
(thethings.flags.field) = {
  set_flag_new_func: "go.thethings.network/lorawan-stack/v3/cmd/ttn-lw-cli/customflags.New4BytesFlag",
  set_flag_getter_func: "go.thethings.network/lorawan-stack/v3/cmd/ttn-lw-cli/customflags.GetExactBytes"
}
  1. Use the expected length in:

    • len in the validate.rules option
    • Unmarshal4Bytes in the thethings.json.field option
    • New4BytesFlag in the thethings.flags.field option
  2. Run tools/bin/mage proto:all

  3. Go to the .pb.go file corresponding to the thing you're editing, and

    • remove custom written getter for the field
    • use gopls to find all references the field, and fix the code using the .Bytes() method, or using types.MustDevAddr(xxx), together with .OrZero() if it expects a value type instead of a pointer type. It's also possible that you need to remove some of the .Bytes() or types.MustDevAddr(xxx) that were added in previous commits.
  4. Use a separate commit for each field, otherwise it may get messy. In my experience it was also easier to make the changes in TheThingsIndustries/lorawan-stack first and then backport them, instead of the other way around.

I'm picking up api/tti/* today.

And the other task I mentioned is https://github.com/TheThingsIndustries/lorawan-stack/issues/3187, so please also take a look there.

Picking deviceclaimingserver.protoand gateway.proto.

Note:

  • Based on a Slack discussion, Let's leave out repeated fields this round and get back to that.

Picking up end_device.proto, networkserver.proto and packetbrokeragent.proto

Picking up metadata.proto and messages.proto

Based on offline discussion

@KrishnaIyer is it just about removing this field?

@htdvisser can you list the things that need to be done to totally get rid of gogoproto?

Yes it's only one field in keys.proto but it's usage is broad and would require modifications to adapt it.

I don't have a complete list of everything that needs to be done to totally get rid of gogoproto, but here are the next steps that I've determined so far in the issue/go-proto-v2 branch:

  1. Removing customtype options (see also @KrishnaIyer's comment)
  2. https://github.com/TheThingsIndustries/lorawan-stack/issues/3187
  3. Re-generate protos using plugins that support the GoProtoV2 API
  4. Refactor the errors package to use the GoProtoV2 API (https://github.com/TheThingsIndustries/lorawan-stack/commit/ffbaaed64051b99d0321019efe4d007a61ce0262)
  5. Refactor the API to use the GoProtoV2 timestamp and duration types (https://github.com/TheThingsIndustries/lorawan-stack/commit/c22fb0dd390b2f62b7ef8acdf3d8d26ad1f5ebfa)
  6. Replace GoGo WKTs with GoProtoV2 WKTs (https://github.com/TheThingsIndustries/lorawan-stack/commit/8fc7f1fd469c1085b94f0bda3757913b3298da47)
  7. Update protobuf and grpc-gateway imports (https://github.com/TheThingsIndustries/lorawan-stack/commit/7d054afb96621cb23f31e1e74362549178ec844c https://github.com/TheThingsIndustries/lorawan-stack/commit/87b0f20e5b2aa0ab20f0b7c63a95847a38786784 https://github.com/TheThingsIndustries/lorawan-stack/commit/510854c7e6c95152e7a643700f7d5bdf1343a9c6)

Once I reached that point, I was getting too many compiler errors to make any more progress, but I think that those will resolve once (1) and (2) are done, and we'll be able to continue.

Currently working on keys.proto and trying to manage the errors explosion ... turns out we use session keys and root keys quite a lot

Step 1 and 2 from the comment above are done (๐ŸŽ‰).

I've started looking into updating protoc-gen-validate based on the comment above and some details have changed since the last update:

At this point in time I'm not in favor of creating a new protoc-gen-go-validate. I think we should 'reset' our fork to the latest release of protoc-gen-validate and check if we can cherry-pick our changes (the main ones are bufbuild/protoc-gen-validate@c97df82, bufbuild/protoc-gen-validate@4b98317 and bufbuild/protoc-gen-validate@3549aa7). This should suffice (๐Ÿคž) in order for us to move forward. I'll take a look tomorrow into how feasible this is.

I've built two new releases of protoc-gen-validate and protoc-gen-fieldmask that are based on the V2 proto API. They are based on the new release of protoc-gen-star that enables our generators to generate V2 API code.

I've updated the tooling in a separate branch to use the new protoc / protoc-gen-go tooling and remove gogo. You can see the diff here. In our generated code (paths, setters, validate) the changes are minimal to none.

Please sample some files and let me know what you think. If things look good, we can start implementing the commits from #2798 (comment) on that branch, and finally move to API v2.

Migration status update:

In order to speedup the development, we can already start backporting some changes from https://github.com/TheThingsIndustries/lorawan-stack/pull/3445 in order to lower the diff:

  • Embedding the ttnpb.Unimplemented*Server server implementations into our server implementation.
    The background here is that the v2 API requires that every server implementation publicly embeds the unimplemented implementation.
    I have already done the changes in https://github.com/TheThingsIndustries/lorawan-stack/pull/3445 , but due to poor commit management backporting these changes is hard. The general idea is to do the following:
  1. Find a RegisterServices(s *grpc.Server) call site:
    // RegisterServices registers the Configuration service.
    func (c *ConfigurationServer) RegisterServices(s *grpc.Server) {
    ttnpb.RegisterConfigurationServer(s, c)
    }
  2. Go to the definition of the implementation (second parameter):
    // ConfigurationServer implements the Configuration RPC service.
    type ConfigurationServer struct {
    component *Component
    }
  3. Add the ttnpb.Unimplemented*Server to the struct:
    https://github.com/TheThingsIndustries/lorawan-stack/blob/0bfac6481e31ca26946e3b39fd15a1f585b85e26/pkg/component/configuration_grpc.go#L32-L37
    You're now done.

Migration status update:

  • The monolithic commit containing all of the changes has been completely broken down. All of the non-migration specific changes have been backported to v3.23.
  • The test resemblance issue has been fixed. After the migration, we will be using go-cmp in order to do assert value resemblance. We currently twist its hand a bit in order to make it behave like reflect.DeepEqual, but long term we should consider if we can remove this.
  • Unit test, Console end to end tests and traffic end to end test pass.

After the Christmas holidays we can start merging back this mammoth change in enterprise and open source.

The only problem to settle is backwards compatibility with respect to the JSON API. The context is as follows:

  1. For a large part of v3's lifetime, field masks were rendered as objects, not as strings. This behavior was caused by the gogoproto implementation of jsonpb.
    In this context, object form means {"paths": ["a.b.c", "c.d.e"]}, and string form means "a.b.c,c.d.e".
  2. When we started using protoc-gen-go-json one year ago, we accidentally broke the API and caused some field mask to render as strings. Messages which contained a field mask and an EndDevice ended up having a custom JSON marshaler, which in turn caused field masks to be rendered as strings. Messages which did not, such as ApplicationWebhook, ended up having field masks rendered as objects.

@ysmilda has implemented (1) which allows us to generate the marshallers for every message, thus ensuring that at least we use the same style everywhere, and (2) allows us to choose which style we want to use.

In https://github.com/TheThingsIndustries/lorawan-stack/pull/3445 I actually 'broke back' the API to render all field masks as objects, in order to keep consistence over time. My argument here is that for most of v3's lifetime and for all of the code in the wild, the object form is what is used.

Are there any objections to 'breaking the API' a second time to keep things consistent to what they were a year ago ? Our JSON is not JSONPb anyway (custom renderers, custom enum marshaling). @johanstokking @KrishnaIyer


Unmarshalling fieldmasks is not a problem - protoc-gen-go-json automatically can unmarshal both styles at the same time.

Are there any objections to 'breaking the API' a second time to keep things consistent to what they were a year ago ? Our JSON is not JSONPb anyway (custom renderers, custom enum marshaling). @johanstokking @KrishnaIyer

Yes I prefer consistency, even if we need to break the JSON API.