cosmos/cosmos-sdk

Protobuf Transaction Signing

Closed this issue Β· 83 comments

Problem Definition

The Cosmos SDK has historically used Amino JSON for signing of transactions whereas Amino binary is used for encoding.
During the SDK's migration to protobuf, we had made the preliminary decision to use a canonical protobuf JSON encoding for signing as described in https://github.com/regen-network/canonical-proto3.

As a consequence of #6030, the Cosmos SDK is moving in the direction of using protobuf's Any type for the transaction encoding and signing format. In this discussion, a number of participants have asked that we revisit the transaction signing discussion. The options that have been discussed/are available for consideration are outlined below.

It should be noted that it is theoretically possible to support more than one of the following options via an enum flag on the signature. Whether that should or should not be done is a related question.

Proposals

Feel free to suggest updates to these alternatives in the comments

(1) Protobuf canonical JSON

The official proto3 spec defines a canonical mapping to JSON. This is not really deterministic, however, so we define a canonical encoding on top of that using https://gibson042.github.io/canonicaljson-spec/.

Pros:

  • By including field names, JSON is more self-describing than proto and is somewhat human-readable
  • Can prevent certain user errors made when manually copying proto definitions or if proto definitions are changed between versions (shouldn't be done)
  • Relatively easy migration for existing signing tools like the Ledger app
  • Causes transaction verifiers to reject unknown fields which is generally correct

Cons:

  • Converting to another format for signing may introduce transaction malleability vulnerabilities - i.e when different encoding representations map to a single signing representation. This is known to exist, at least for Timestamp, although that malleability is likely un-exploitable.
  • Requires clients to have proto JSON and canonical JSON in addition to protobuf. There are very few implementations of canonical JSON (only go as far as we know) and some protobuf implementations don't implement proto JSON correctly (none of the Rust ones do for example)
  • Doesn't bech32 encode addresses, pubkeys, etc. like Amino JSON
  • By default, encodes fields in lowerCamelCase, i.e. some_field becomes someField
  • Special treatment for well known types (e.g. Timestamp, Any, Duration, Struct) not necessarily supported well in existing libraries.
  • And kind of renaming in field names leads to breaking changes of the message

(2) Protobuf canonical binary

This involves re-encoding the protobuf used for transaction encoding canonically for signing - meaning that fields must be ordered and defaults omitted. This is how Weave does signing.

Pros:

  • Simpler for clients to implement, most protobuf implementations serialize things canonically
  • Canonicalization could prevent a certain small class of user errors (if the wrong protobuf definition was used to encode a message)
  • Renaming fields is a non-breaking change (as long as semantics don't change)
  • Causes transaction verifiers to reject unknown fields which is generally correct

Cons:

  • Introduces a subtle transaction malleability vulnerability if modules attempt to use proto2 semantics - i.e. interpret null/omitted differently from default/zero
  • Not all protobuf implementations serialize things canonically and may require an additional canonicalization layer
  • Doesn't prevent user errors made to reordering fields when copying or modifying proto files (like JSON does), but this should be much less of an issue using Any (because the full type URL is included and oneof's are not copied manually) and breaking change checkers like Buf/Prototool
  • Hard to implement in the Ledger app for many/arbitrary message types

(3) Protobuf binary as encoded in transaction

This simply uses the protobuf encoding as broadcast in the transaction. This becomes a little easier for both signature creation and verification because of Any (although it could be done without Any too). Because Any wraps the raw bytes of the sdk.Msg, it is pretty easy to use these same exact bytes for signing and verification and only require that SignDoc itself is encoded canonically, rather than every Msg 

Pros:

  • Simpler for clients to implement than even (2). All implementations should pretty much just work
  • Is not vulnerable to proto2 semantics transaction malleability as (2)

Cons:

  • Same as (2) in not preventing user errors related to manually copying or tampering with proto definitions, although less of an issue with Any
  • Does not cause transaction verifiers to reject unknown fields (unlike 1, 2 & 4) which may lead to unexpected behavior for comments

(4) Amino JSON

This is how tx's are signed currently. The reason this is under consideration is because breaking Amino JSON signing would break many clients especially the ledger app. Transactions could still be encoded with protobuf and the /tx/encode endpoint could accept Amino JSON and return protobuf rather than amino binary for tx broadcasting - some upfront work would be required to enable this but it is possible

Pros:

  • Doesn't break existing clients right away
  • Doesn't introduce new transaction malleability issues that weren't already there
  • Bech32 encodes addresses, pubkeys, etc. for better readability than (1)
  • Causes transaction verifiers to reject unknown fields which is generally correct

Cons:

  • Delays the deployment of protobuf signing, although Amino JSON signing could be enabled via a flag on Signature so maybe this is a non-issue
  • Requires an additional layer on top of protobuf with information that isn't conveyed by the protobuf schema and so far lacks extensive library support

(5) Custom Proto JSON

Extend (1) to support custom encoding of certain types like bech32 addresses

Pros:

  • The same benefits as Amino JSON for signing with embedded devices but derived from .proto files, i.e. human readable bech32 addresses and keys
  • An easier migration path than (1)
  • Does not try to re-use an encoding that was never designed to be deterministic
    (multiple signing algorithms?)

Cons:

  • Requires a possibly even more complex custom JSON layer on top of protobuf plus extensions to protobuf field to indicate custom formatting like bech32
  • Every signature verifier (on and off-chain) needs to maintain support all options (forever)

Related Question: Should we allow multiple signing algorithms?

Theoretically we can allow clients to use multiple signing algorithms and indicate which one they used as an enum flag on the Signature struct.

Pros:

  • allows clients to sign with the method they feel are able to use and feel most secure with
  • allows Amino JSON compatibility without delaying protobuf signing support

Cons:

  • increases the number of vulnerability paths that need auditing
  • increases maintenance requirements for supporting multiple paths* leaves too much responsibility to clients rather than making an opinionated, well-informed decision

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned

I do want to say that after considering the above, my personal preference is to go with approach (3) (signing the raw binary) as default and to allow (4) Amino JSON for a limited period of time while clients transition.

(3) seems to offer the simplest UX for clients as well as the least number of malleability issues. Given that Any should minimize user errors from manually manipulating .proto files, I don't see that as as big of an issue as with Amino or the proto oneof approach.

Supporting (4) temporarily through a flag doesn't seem to introduce any new issues that aren't already here, and has the big benefit of not disrupting wallets, exchanges, etc. that don't have time to transition overnight.

Thank you for writing this up @aaronc !

Here are the problems I see with 3.
Protobuf deserializing in embedded/wasm environments is extremely difficult. This is why we are actually avoiding protobuf in Armistice.

This basically means that you can't decode the transactions inside your Trusted Computing Base and you need to extend your trust boundaries around signing to enclude another system that will deserialize your proto bufs for you. This is going to be a disaster for practical security.

Strongly prefer 4 and 1. Because AMINO Json serialization in good enough form is pretty easy to implement and widely available. proto3 cannonical json actually looks okay but not really sure how wide spread support for this is. I think prost doesn't support this?

@zmanian why is protobuf deserialization in embedded environments so hard? Have you looked at stuff like https://github.com/nanopb/nanopb? Even if that didn't work, I don't think a hand decoder should be too hard. The JSON stuff in the ledger app appears to be mostly hand coded.

I'm fine with enabling Amino JSON for compatibility, but we should have an alternative going forward that just requires .proto files.

Regarding this:

Theoretically we can allow clients to use multiple signing algorithms and indicate which one they used as an enum flag on the Signature struct.

...it's reminiscent of the alg field in JOSE JWT/JWS, which has been implicated in numerous security vulnerabilities (e.g. this recent one), namely via an ongoing history of implementation bugs which leverage an attacker-controlled alg into tricking a verifier into using the wrong algorithm.

An alternative is to make alg a property of public keys instead of signatures, ala X.509 Subject Public Key Info (SPKI).

This gives you a strong binding between an algorithm and a public key, known a priori to the verifier, and with that, a signature can name the "SPKI" (or SPKI hash)

Some discussion here
https://www.reddit.com/r/rust/comments/aequik/is_there_a_no_std_compatible_protobuf_library_out/

My guess would be that nanopb would be too big for the ledger.

Adding a C deserialization library seems very much against our goals of minimizing C in the TCB.

Also pretty strongly support the idea of binding the serialization system to the public key to support forward migration!

@aaronc

why is protobuf deserialization in embedded environments so hard?

The Protobuf Schema language lacks descriptions for the (maximum) size of variable-length fields.

This means it isn't possible to codegen a struct with a fixed-sized APDU-like structure which is typically used in heapless (e.g. microcontroller) environments.

Libraries which do support Protobufs in embedded environments therefore tend to work a level of abstraction below a typical Protobuf library which generates message types directly from the schema. This is also bad from a code size perspective, because it punts all of the work of size-checking the underlying fields to the end user.

...it's reminiscent of the alg field in JOSE JWT/JWS, which has been implicated in numerous security vulnerabilities (e.g. this recent one), namely via an ongoing history of implementation bugs which leverage an attacker-controlled alg into tricking a verifier into using the wrong algorithm.

Okay, so that's definitely a consideration for supporting multiple algorithms. I do want to note that we're talking about supporting maybe 2-3 algorithms and none wouldn't be one of them. JWT seems to support ~20.

Also pretty strongly support the idea of binding the serialization system to the public key to support forward migration!

Would there be an option to change the binding to a newer system?

https://www.reddit.com/r/rust/comments/aequik/is_there_a_no_std_compatible_protobuf_library_out/

In that thread it seems there are at least 2 users who hacked together a way to do it. So definitely possibly, just not standardized in a library yet.

The Protobuf Schema language lacks descriptions for the (maximum) size of variable-length fields.

This means it isn't possible to codegen a struct with a fixed-sized APDU-like structure which is typically used in heapless (e.g. microcontroller) environments.

How is this any different from JSON? Strings/byte arrays have no max length in JSON either. If you had fixed sized arrays, you would need to truncate strings with either json or pb. But I'm not even sure why you'd need to do that sort of copying into a struct. From my quick glance at the ledger app source code, it seems like one of the key things it's trying to do is display info in fields to users. You could do the same thing in protobuf with zero allocation. Just iteratively navigate through the message like you would with JSON and keep track of what level of nesting you're at (which should be possible with a fixed depth). Strings should be easier to extract from protobuf because you don't need to copy to remove escape chars. And addresses need a small array to convert from bytes to bech32.

Anyway, I definitely can understand the concern of not wanting to rewrite a bunch of firmware, thus my support for keeping an Amino JSON option. But these problems with protobuf to me seem solvable...

How is this any different from JSON?

JSON and Protobufs are both problematic in this regard. There are some optimizations you can do for JSON as a proper context-free grammar, but ultimately on platforms with low memory the main one is ensuring either JSON or Protobufs have a well-known fixed-sized structure.

Just iteratively navigate through the message like you would with JSON and keep track of what level of nesting you're at

I've been working on this sort of pushdown automaton in Veriform, as it were.

For some extreme low-end embedded environments, even that sort of thing is too much (our target is a 500+MHz Cortex-A environment, whereas the problematic environments are much lower clocked Cortex-M ones with much smaller stacks)

I want to jump in with an argument against Amino JSON. Basically, if we keep using that, we will need to support that tooling on all platforms. Much less work than binary Amino, but over 2 years there has been amazing little work by the core team to port any of Amino, and assuming all this magically happens "from the community" is wishful thinking. If we need to stay with Amino JSON, then I would say that porting client side libraries in major languages (not just JS and Rust, but say Java/Kotlin and ObjC/Swift at the minimum, ideally python and some more) comes into scope as part of this migration. If we can use tooling that already works out of the box in all these languages, than that is much easier.

The strong valid criticism I see above is that it is hard to parse Protobuf in an HSM.

My guess would be that nanopb would be too big for the ledger.
Adding a C deserialization library seems very much against our goals of minimizing C in the TCB.

I encourage you to look at the Ledger app that Juan (same dev who wrote Cosmos Ledger app) wrote for IOV: https://github.com/iov-one/ledger-iov There isn't too much code there for the parsing. Actually grabbing a few fields from a predefined protobuf format is quite easy, and doesn't require parsing arbitrary structs into memory.

It parses the Protobuf signing format we use for transactions and displays it to the users. It didn't take him that long to do it (I believe less than the original Cosmos app). I just want to say that using protobuf is not some new and untested idea, but was used for over 2 years now and on a mainnet. That IOV has no clue about business and marketing doesn't mean the code there has no technical merit. I advise you to borrow liberally.

JSON and Protobufs are both problematic in this regard. There are some optimizations you can do for JSON as a proper context-free grammar, but ultimately on platforms with low memory the main one is ensuring either JSON or Protobufs have a well-known fixed-sized structure.

Okay, well sounds like this problem would only be solved by something like ASN.1 which isn't really an option. But either way sounds like we can actually deal with the memory problem with an event driven parser as opposed to decoding into a struct. So then the issue becomes code size, right?

Here I can see one potential argument in favor of JSON. JSON is self-describing so if you don't have the schema, you could still iteratively display each raw field on the JSON on a device like the ledger. To do that in protobuf, you would need to include the schema for every type or limit support to just a few types to limit code size. Is that one of the tradeoffs you're seeing @tarcieri ?

One point I will grant to Amino JSON is that it bech32 encodes addresses, pub keys, etc. Say you wanted to iteratively display every element of a JSON object in the ledger without the schema, with protobuf you would get base64 whenever you don't know the schema. Maybe that's not a hard problem to solve, but I do see it as a valid good thing about Amino JSON.

To do this with protobuf we would need a custom JSON serialization format which maybe indicated the bech32 type of bytes fields with an extension (i.e. bytes key = 1 [(cosmos_proto.bech32_type) = "valpub"]). That's maybe not a bad idea to include in the .proto files anyway, but as @ethanfrey noted the more custom work clients need to do the bigger the burden.

Anyway, I will include this as a new option (5) and update the pros and cons of the other options to reflect this discussion.

I do still think there is an elegance to approach (3) and if it did work for embedded devices, maybe following the example in https://github.com/iov-one/ledger-iov, that might make everyone's lives easier.

This self describing nature of Javascript allows the Cosmos app to work any number of chains out of the box.

This self describing nature of Javascript allows the Cosmos app to work any number of chains out of the box.

That and the fact that they all use a superset of the schema the ledger app understands.

It checked for eg. .msgs[0].type == "cosmos-sdk/send" and then .msgs[0].data.amount[0].amount for example. If we encoded it in JSON but with different keys, this wouldn't work.

Note that with option (3), protobuf is also self-describing to a degree. At least you can check the type they claim and will not mix up cosmos-sdk/send and cosmos-sdk/burn. It will not know how to display any type that it was not compiled for. But then again the JSON parser wouldn't either - it can just display the raw JSON to the user. Is that mode actually used (display raw JSON of sign bytes to end user). If this is a common use-case (pass raw bytes to the end user to interpret), then there is a big bonus for JSON. Otherwise, I don't see how self-describing helps an app that checks hardcoded fields - it just helps avoid mixups (like the Any type field does)

If this is a common use-case (pass raw bytes to the end user to interpret), then there is a big bonus for JSON.

I want to caution against weighting this case too highly in the context of a generic user interface. A user interface that is made up of simply printing out the json key/value pairs without understanding the underlying message format will yield a strictly poor user experience for most "end" users. I feel like there is an important distinction between the audience of developers using the system and the users the developers are intending to support. The developers should be familiar with Proto and the tooling required to deal with it ... so the Json step is strictly speaking an extra effort that may or may not support the needs of the users the developers are working for.

@iramiller

I want to caution against weighting this case too highly in the context of a generic user interface. A user interface that is made up of simply printing out the json key/value pairs without understanding the underlying message format will yield a strictly poor user experience for most "end" users.

To further emphasize this, I believe Ledger has required moving from displaying raw JSON to a UI which extracts, displays, and confirms values in the message for exactly these reasons.

@aaronc

Okay, well sounds like this problem would only be solved by something like ASN.1 which isn't really an option.

For what it's worth, ASN.1 DER solves this problem poorly as well. Most embedded implementations of DER are actually BER parsers that don't even verify the BER is canonical (and therefore DER).

The most embedded friendly formats follow an "APDU"-like structure (i.e. fixed-sized fields everywhere). You can get similar properties out of either Protobufs or JSON if you ensure all of the fields in either a message are constant-length (by using e.g. fixed integer types and fixed-length bytes fields with Protobufs).

The most embedded friendly formats follow an "APDU"-like structure (i.e. fixed-sized fields everywhere). You can get similar properties out of either Protobufs or JSON if you ensure all of the fields in either a message are constant-length (by using e.g. fixed integer types and fixed-length bytes fields with Protobufs).

@tarcieri Would you consider Cap'n Proto embedded friendly? (Not that it's an option anytime soon...)

Not particularly. Cap'n Proto is significantly more complicated than Protobufs (see for example message segments and inter-segment pointers)

Great stuff here, everyone. A few additional πŸ‘ / πŸ‘Ž for the main list that you are free to merge somehow:


(1) Protobuf canonical JSON

Pros:

  • Relatively easy migration for existing signing tools like the Ledger app

Cons:

  • Special treatment for well known types (e.g. Timestamp, Any, Duration, Struct) not necessarily supported well in existing libraries.
  • And kind of renaming in field names leads to breaking changes of the message

(2)

Pros:

  • Renaming fields is a non-breaking change (as long as semantics don't change)

Cons:

  • Hard to implement in the Ledger app for many/arbitrary message types

(3)

Cons:
- Impossible due to circular dependency: Protobuf binary as encoded in transaction contains signatures but we don't have the signature(s) yet.

(4)

Cons:

(5)

Pros:

  • Does not try to re-use an encoding that was never designed to be deterministic

(multiple signing algorithms?)

Cons:

  • Every signature verifier (on and off-chain) needs to maintain support all options (forever)

@zmanian if there was a JSON document that is signed, would you expect a JSON->proto conversion to be possible, i.e. you only operate on JSON and assume this can be translated back to the transaction format understood by Tendermint (proto binary). Or would it be sufficient to create a JSON document from a proto document (one way function proto->JSON), which is then sent to chain?

Amino allows two-way mappings, but this is significantly harder to get right than one way mappings.


While IOV's Ledger app is great for IOV's use cases, I think it is important to note that as of now it only supports a single message type (with ongoing work to add a handful more). Chain ID and address prefix are compile time constants with a just a boolean testnets/mainnet flag.

(3)

Cons:

  • Impossible due to circular dependency: Protobuf binary as encoded in transaction contains signatures but we don't have the signature(s) yet.

Not true. Maybe re-read how I framed it and look at how Any is encoded. I never suggested signing the transaction, just the SignDoc which can contain the exact same pre-encoded Any msg's as the transaction.

Sorry @aaronc, you're completely right. What would probably help (at least for me) is to repeat in the top what we are talking about: the ecoding of the SignDoc structure from ADR 20 from master plus the oneof-to-Any change similar to #6081.

Just

To further emphasize this, I believe Ledger has required moving from displaying raw JSON to a UI which extracts, displays, and confirms values in the message for exactly these reasons.

My take here is that having schema aware signers should be an enhancement of the baseline signing experience.

Prior art in Ethereum, Bitcoin etc is that if your signer isn't aware of the schema you are signing then you are signing pretty much opaque bytes. By using json as the signing target, you get an enhanced experience if the signer is aware of the schema and fall back to something somewhat human readable.

I can imagine a format that is easier to implement in Rust and other languages than Amino JSON.
Like this is sort of awkward
https://github.com/iqlusioninc/deep_space/blob/develop/src/canonical_json.rs#L9-L34

But also aprprox 10 LOC to implement.

I'm generally in favor of 1 or 4.

@zmanian if there was a JSON document that is signed, would you expect a JSON->proto conversion to be possible, i.e. you only operate on JSON and assume this can be translated back to the transaction format understood by Tendermint (proto binary). Or would it be sufficient to create a JSON document from a proto document (one way function proto->JSON), which is then sent to chain?

Amino allows two-way mappings, but this is significantly harder to get right than one way mappings.

It's totally fine if you need a protobuf schema to turn the json into bytes on the wire.

The general pattern is that singers are resource constrained and difficult to update settings and serializers are much more flexible.

It's totally fine if you need a protobuf schema to turn the json into bytes on the wire.

@zmanian okay, that's level 1 independence. Level 2: what if it was not possible at all to map back JSON->proto?. Instead you must use the proto doc from the composing environment plus the signature:

    p2p network            Composing environment      Signing environment
------------------        -----------------------    ---------------------
                          
                            unsigned tx proto  --------->  SignDoc (JSON)
                                     |                          |
                                     |                          | sign
                                     |                          |
                                     v                          v
 signed tx bytes  <----------  signed tx proto  <-----------  signature
                    serialize 

I'm generally in favor of 1 or 4.

So I'd like to take 4 (Amino JSON) off the table as a long term solution. Short-term, sure. But we you want something like Amino JSON, let's consider 1 or 5 where all of the information is in the .proto files.

@webmaster128 added your pros/cons to the main list (except the Amino JSON con which was already there worded differently)

I love this diagram

The composing environment is expected to know how to take an unsigned prototx and a signature and turn it into signed bytes..

The verifier is expected to know how take signed bytes, generate a SignDoc and verify a signature.

  chain/verifier               p2p network            Composing environment      Signing environment
-----------------  ------------------        -----------------------    ---------------------
                          
                                                                      unsigned tx proto  --------->  SignDoc (JSON)
       verifier                                                                           |                          |
              ^                                                                           |                          | sign
              |                                                                            |                          |
              |                                                                            v                          v
   SignDoc (JSON) <------ signed tx bytes  <----------  signed tx proto  <-----------  signature
                                                            serialize 

I love this diagram

Glad to hear that.

Where I am heading to is: proto->JSON serialization is going to be non-trivial, but I am sure it can be done as described in (1), (4) or (5). The reverse operation (deserializing JSON->proto) however is specified very openly, allowing all kind of JSON variants that lead to the same proto document. This starts with allowing both numbers and strings to deserialize to int32, fixed32, uint32, int64, fixed64, uint64 ("Either numbers or strings are accepted.") and just gets more complicated when allowing different RFC3339 timezones as an input for proto's Timestamp ("Offsets other than "Z" are also accepted."; how to handle perfectly valid RFC3339 leap seconds?). I'm not saying multiple JSON represenations that decode to the same SignDoc are necessarily insecure. But if this mapping needs to be supported, we'll have to do much more work specifying all the edge cases. (I don't buy any getting feature X from library Y for free).

When the JSON representation is only used for signing, we lose the current flow of broadcasting signed JSON to the REST server, which is a good thing in my opinion. I believe a client (tx compose and breadcasting environment; no privkey here) should be able to operate on proto, given a Cosmos specific wrapper around a general purpose proto lib. But I want to make sure there is consensus on this.

I do want to re-iterate that there is something pretty elegant about (3) - just signing the raw binary.

All of the JSON solutions including the standards-based approach (1) require both a) a fair amount of additional client library support and b) substantial auditing to check for edge cases and malleability issues.

It seems that the biggest benefit of the JSON solutions is that we could just show raw JSON to users of the ledger if the ledger app doesn't have the full proto schema. But this convenience does come at a cost elsewhere.

With approach (3), you have both the least surface area for transaction malleability issues and the easiest implementation for composing and verification environments.

For hardware signing environments, there is going to be complexity whichever approach is used. Is the benefit of being able to show raw JSON as a fallback worth all the additional complexity elsewhere?

yes the whole strategy of signing the raw binary produces a system that is far too cumbersome to extend.

an isolated signing environment brings little benefit if you don't have access to secure display to confirm what you are signing.

The weak link in blockchain protocols are the humans that interact with them.

I can't emphasize enough that I am overwhelmingly opposed to signing non-self describing dataformats.

I'm completely with @aaronc here (and @iramiller and others). Whatever I write about JSON signing is to be interpreted as If JSON signing was required, how would it look like?. From the beginning I favoured choosing one, either commit 100% to the protobuf document (i.e. sign native byte representation of the protobuf doc) or commit 100 % to the JSON document (i.e. store JSON in transaction history). Any translating between the two will cause friction for all implementations as well as a serious specification overhead.

For historic reasons there is a general purpose Ledger app "Cosmos (ATOM)" that seems to sign arbitrary message types and seems to work for any chain. But is this set in stone? Does this make sense at all? We already spoke about the poor user experience when displaying a JSON on a Ledger. How confident can you be signing a medium complex message like

{"account_number":"4","chain_id":"testing","fee":{"amount":[{"amount":"12500","denom":"ucosm"}],"gas":"500000"},"memo":"Create an ERC20 instance for JADE","msgs":[{"type":"wasm/instantiate","value":{"code_id":"1","init_funds":[],"init_msg":{"decimals":18,"initial_balances":[{"address":"cosmos1pkptre7fdkl6gfrzlesjjvhxhlc3r4gmmk8rs6","amount":"189189189000000000000000000"},{"address":"cosmos17d0jcz59jf68g52vq38tuuncmwwjk42u6mcxej","amount":"189500000000000000000"}],"name":"Jade Token","symbol":"JADE"},"label":"JADE","sender":"cosmos1pkptre7fdkl6gfrzlesjjvhxhlc3r4gmmk8rs6"}}],"sequence":"3"}

without tooling to interprete and display this message? So I wonder, is a hardware signer really supposed to work for arbitrary messages?

The other thing I'd like to question is the use of a single Ledger app for multiple chains. E.g. Enigma uses the Cosmos Ledger app, but is that a good idea? The app uses the HD derivation path m/44'/118'/0'/0/a where 118 is the coin index for ATOM, so this is an app designed for signing on the Cosmos Hub. Re-using the same app on a different chain is at the very least a privacy issue: different Bech32 address prefixes hide the fact that a user uses the same public key on multiple chains and can easily be linked across chains.

Just my 2cents. No idea how to weight those arguments.

I was wondering, if all we need is a simple human readable text dump of the proto document, why would that be JSON? If you are curious, you can look into my Note on alternative text dumps of a proto document for Comsmos SDK signing. Since I'm not convinced with the result, I'm not inlining it here. See it as brainstorming or a way to challenge the status quo.

Yeah I was starting to think the same thing.

If we want to optimize for human-readable, why not go all the way?

We could output YAML with Title Case fields:

From: cosmossdn248hsdgsdg
To: cosmossdgheg8hsdgoet3
Amount: 10atom

That would be pretty easy to parse and display on a ledger.

Or why not just embed ICU MessageFormat strings as extensions in the .proto files?

Something like:

Send {amount, coins} from {from, address} to {to, address}

would generate:

Send 10 atoms from cosmossdn248hsdgsdg to cosmossdgheg8hsdgoet3

So I like this idea of having an embedded mark up language to replace JSON as a tx encoding language, this is generally something I've been interesting for several years.. It might even be something we could standardize with other protocols.

I'm also saying that if we have to get every custodial solution to do a major upgrade of their software and push major change to Ledger app through their process before the Cosmos Hub upgrades. The protobuf changes will have to be torn out and put in post IBC upgrade in 2021.

There will either be a JSON signing mechanism for protobuf or there won't be an upgrade of the Cosmos Hub to protobuf anytime soon.

The hard requirement is the next protobuf upgrade must have the smallest possible breaking changes in signing.

We can plan future improvements that make a large scale change but expect something like 1 year of planning.

@zmanian I don't think anyone is arguing that we disable amino signing until the ecosystem is ready. At least I haven't been. There should be a legacy signing mode and /tx/encode should convert an amino tx to a protobuf tx. We deprecate later maybe after a year.

I do think we should enable a pure protobuf solution side-by-side as soon as protobuf goes live.

Here's what I propose as a solution. We create a signature structure like this:

message Signature {
    PubKey pub_key = 1;
    bytes signature = 2;
    SignMode mode = 3;
}

enum {
    BASIC = 0;
    LEGACY = 1;
    EXTENDED = 2;
}

BASIC is approach (3) just signing the raw proto bytes that get passed in the Anys. It's the easiest to implement and the only solution which has an almost zero transaction malleability surface. This unblocks developers who want to use pure proto from using pure proto.

LEGACY is amino JSON for compatibility as described above.

EXTENDED is a human-readable encoding that may actually get fleshed out later. One thing I want to note about this human-readable encoding is that the more involved it gets (say MessageFormat), the more transaction malleability could be an issue (I can think of several issues right away) and the more auditing/static linting that will be needed. But there is a pretty easy solution, just always concat the raw bytes from the BASIC approach (3) with the human readable text in the SignDoc. Then I don't think malleability is an issue no matter how complex the human readable format is.

So SignDoc could look like:

message SignDoc {
    Any msgs = 1;
    Fee fee = 2;
    string memo = 3;
    string chain_id = 4;
    uint64 account_number = 5;
    uint64 sequence = 6;
    string human_readable_text = 7;
}

So the human_readable_text just gets optionally appended to the SignDoc when using EXTENDED mode. Then the human readable text could be anything. We could support multiple formats. Maybe we use MessageFormat and support translations to 6 languages! I don't know...

But this human readable solution can come later. We start with the easy straightforward approach that satisifies compatibility (Amino) and pure proto.

How does that sound?

This sounds promising.

What I am hoping is that we can build a language where you can take a byte stream + a markup stream and then securely display the contents of the bytes stream without full deserialization.

Probably you need to commit to the markup stream during signing and verifying but this should be great.

I have been an advocate of the raw byte any message and signing approach for many reasons (concise, secure, etc) which I have tried to express in comments here and in other places. For the uses I have and foresee it seems to be the most elegant and straightforward approach.

The signature enum mentioned above seems perfect for meeting the migration need and preserving the ability to explore other signature methods that may be available in the future. Discipline will be required to ensure this flexibility does not lead to retention of options that are poor choices for security in the interest of backwards compatibility. That is a sad crypto story has been played out many times with tragic consequences.

That is a sad crypto story has been played out many times with tragic consequences.

@iramiller could you say more about what you mean here?

Glad to see there's some alignment around moving forward with this proposal. I think it unblocks the process at the present moment and leaves the door open for a more ideal alternative.

I am planning to start writing this up as an update to ADR 020 and will share that for comments when it's ready.


If we are going to move forward with a human-readable format, I do want that to be a project that comes to fruition sometime in the next few months (not years) after the initial .proto release. In the interest of being concrete, are either of the options I presented - YAML and MessageFormat - in the rough ballpark of the human readable format we want? @zmanian ? Do we want something human-readable, but structured like YAML? A human readable sentence/statement like MessageFormat? Both?

Also one small tweak to the SignDocabove - we don't want human_readable_text as a field on it. That will force embedded devices to still parse protobuf! Instead we can just concatenate the human readable text and protobuf binary and use length-prefixing or null-terminated strings so that parsing out the text part is trivial.

Another option I am somewhat loathe to suggest due to their long history of never gaining traction is canonical s-expressions (a.k.a. csexps).

They do fit the bill of being a "human readable" (for certain definitions of that phrase) format with a relatively straightforward encoding which is both truly canonical and "binary safe" in ways JSON is not.

For these sorts of applications they're perhaps most notable for their usage in SPKI/SDSI.

The SignMode probably makes sense in general.

I assume the only reason to use Amino JSON would be to keep an 100 % compatible signing solution with existing signers. They'd need to be unaware of the existence of the signing mode, which works with the Signature layout from above. This would require a few things to be specified:

  • A map from Any type URLs to Amino types
  • A list of custom proto types with explicit JSON representations (like e.g. addresses that result in bech32; what about timestamps?)
  • All the nullability rules
  • A model to byte spec like https://gibson042.github.io/canonicaljson-spec/ for Amino
  • ... (probably more – I never really worked with Amino)

Would this also require composing new transactions in Amino JSON and as a consequence a conversion back to protobuf?

If Amino JSON signing existed, is there still a need for a second kind of signable text dump format (called EXTENDED above)?

This would require a few things to be specified:

  • A map from Any type URLs to Amino types
  • A list of custom proto types with explicit JSON representations (like e.g. addresses that result in bech32; what about timestamps?)
  • All the nullability rules
  • A model to byte spec like https://gibson042.github.io/canonicaljson-spec/ for Amino
  • ... (probably more – I never really worked with Amino)

All of this is basically specified in the existing Amino go implementation and clients have as far as I know manually implemented each type they support in JS.

Would this also require composing new transactions in Amino JSON and as a consequence a conversion back to protobuf?

Yes, we would need to implement it at the /tx/encode endpoint to support existing clients. Because all proto types would also support amino encoding, it's mainly a matter of copying the data from the existing tx to the new tx. It shouldn't be super involved.

If Amino JSON signing existed, is there still a need for a second kind of signable text dump format (called EXTENDED above)?

I really don't think Amino JSON is a good long term solution. As you have outlined above, there are a lot of things needed for Amino JSON that aren't defined in the .proto files or that differ significantly (like type URLs). If we are supporting a signable human readable text format, I want it to be derived directly from information in the .proto files - we can use extensions if needed but it shouldn't be a different format. Also, it seems like we've already established that JSON isn't the best if we actually want something human readable. But if it is JSON, please some variation of proto JSON.

All of this is basically specified in the existing Amino go implementation and clients have as far as I know manually implemented each type they support in JS.

hmm, [censored thoughts about Amino]. So in this scenario you cannot convert a proto document to a Amino JSON representation without converting it back to Go structs as an intermediate step?

If that's the case than we have a clear differentiation for the third sign mode type: a textual representation of a proto document, not a Go struct.

the property we want to enforce is that the verifier should be checking that any textual representation of the protobuf matches the data and semantics of the protobuf.

hmm, [censored thoughts about Amino]. So in this scenario you cannot convert a proto document to a Amino JSON representation without converting it back to Go structs as an intermediate step?

Yes, without a lot of infrastructure which doesn't exist

Wouldn't it be a missed opportunity to not drop Amino signing now, at a point where a massively breaking update is coming anyways? Keeping Amino signing makes the protobuf migration fragile, as the standard tooling (proto blob + .proto schema definition + maybe custom serializers for certain well known types) is not sufficient to verify a signature. We're leaving the land of specifications and cross-language support again and depend on gogoproto's casttype annotation, the Go types like e.g. AccAddress and a custom JSON serialization for each type. There is just no in-spec way to encode a bytes field as bech32 without having a specific type. The resulting signature does not sign a proto document, but a Go structure. Maintaining this infrastructure is a heavy, and excludes signature verifiers other than the reference implementation. My gut feeling is it also shifts the development focus in the wrong direction: what's a "message"? A Go struct or a proto document?

This seems to be turning into a political challenge rather than a technical one. On the one side, there is the ATOM holders and their existing signing solutions (side note: not a single private or public key would change when the signing solution changes). On the other side there are application specific blockchain creators waiting to unleash the potential of protobuf message types. And somewhere in-between there is IBC. That's at least what I see. I cannot contribute much to those challenges as I'm neither in charge of nor aware of any kind of development or migration roadmap. But it would probably make sense to take a step back and review the requirements.

As l understand it, not supporting amino json as an option (not the default) would delay protobuf by maybe a year. Verification other than the reference implementation is already not possible. We're trying to move there as soon as we can. Zones also do not need to choose to support amino signing. I think we should move forward, however, with a conversation about what to replace amino json with that's human readable so that the ledger, etc can migrate sooner rather than later and other signature verification solutions can work.

@webmaster128

On the one side, there is the ATOM holders and their existing signing solutions (side note: not a single private or public key would change when the signing solution changes).

@aaronc

As l understand it, not supporting amino json as an option (not the default) would delay protobuf by maybe a year.

The only arguments that it is impossible to leave amino json have come from @zmanian. It would be nice to get more evidence/voices on this one. I agree that this is breaking to the ledger app, but also possible to write a ledger app that does handle other proposed signing formats. I doubt however, that changing the ledger app will take a year if the work is funded and high priority (eg. ICF funds it, or Cosmos Hub governance).

The fundamental missing info is: Besides the ledger app and some existing go and js code, what else signs cosmos-sdk transactions? And what are their requirements? This gives us an objective point to analyze the "delay by 1 year" part. I do agree it will cause more work (ledger rewrite), but if that starts in the next few weeks, I believe this would be ready before the protobuf/ibc/etc work is audited and ready to go on the cosmos hub.

However if there are any other custom solutions, that does change the equation.

From what I have inferred from various chats, a number of validators have custom (or shared?) signing solutions based on YubiKey. But AFAIK, those are signing the tendermint block headers, which is a different discussion - we don't have to change that signing behavior when changing the Cosmos-SDK transaction signing.

I think the enum solution is a workable solution if such a large scale migration really is impossible in a realistic timeframe, but I would love to understand why.

Here is the size of the ecosystem that will be broken by changing the signing.

  1. Changes to ledger app have typically taken months. We would also have to create a new contract through the ICF for Juan to do this work. We have been iterating on changes to get out of developer mode for more than 6 months. Changes to the signing system will reset all of that work.

  2. There are more than 50 exchanges and custodians that support the Cosmos hub. Almost everyone has a unique implementation and integration between their custodial systems. Some of these implementations run within the secure environments of their HSMs. I've seen implementations in Java, Rust, Go, C++ and C of the signing implementation.

  3. There are at least a dozen non-custodial wallets that support that Cosmos Hub.

A coordinated changes across a 100 different software system will take months to a year to execute.

The people who maintain these pieces of software aren't going to show up in this issue and explain how breaking the signature system will affect them but they are by far some the biggest stakeholders in these changes.

Thank you @zmanian
I was not aware of 2. Only 1 and 3. That does make sense now to maintain backwards compatibility in signing

I've seen implementations in Java, Rust, Go, C++ and C of the signing implementation.

Although the signing of raw Proto messages will be a huge simplification for these non-golang environments so when they do make changes to upgrade it will be a breath of fresh air.

In the mean time the signature type option that supports a transition period provides the flexibility for these environments to plan a migration path that works with their environment. Sure the transitional period supporting two parallel efforts is quite a bit more work, but this is the gift of compatibility being offered to those that invested early. This seems very generous for a project that hasn't even reached a 0.5 release yet. No one building on a platform with less than 1.0/LTS should expect the environment to be stable and each should have an investment plan to put in the work to keep up with the rapid pace of updates.

Thanks for all the feedback here.

This issue was discussed in more detail with @zmanian, @aaronc, @webmaster128 (among others) on our Cosmos SDK grooming call today.

We decided to go with @aaronc's enum proposal as described above:

message Signature {
    PubKey pub_key = 1;
    bytes signature = 2;
    SignMode mode = 3;
}

enum {
    BASIC = 0;
    LEGACY = 1;
    EXTENDED = 2;
}

Followup actions are for @aaronc to update ADR 20 to incorporate this proposal, and separately to work with @zmanian on a proposal for the human-readable translation to/from raw-bytes that will live in Interchain Standards.

My PR to update ADR 020 to reflect the signing format/Any as discussed in this thread is R4R: #6111

Would appreciate a review from anyone involved in this discussion.

There are a few things that didn't come up in the discussion but which I thought were good to include:

  • separating TxBody from Tx and SignDoc
  • separating SignerInfo from signatures
  • SignDocRaw and TxRaw messages for doing the actual signing/verifying with minimal malleability
  • an extension_options field

One thing I want to note about protobuf canonicalization that I forgot when writing up these pros and cons is that it causes transaction verifiers to reject messages with unknown fields (as they won't show up in the canonical version). I believe that's generally the correct behavior for a blockchain - if I set a field which the server doesn't understand, the server should report an error rather than silently ignore.

So this is one benefit of (1) and (4), that (3) (raw bytes signing) doesn't have. Although I should note that maybe canonicalization isn't the best approach anyway.

One particular case that comes up in #6111, is that I'm advocating that the core Tx type proactively add tx options even if they aren't supported on all servers. For instance timeout_height (#6089) gives cIients the expectation a tx will timeout at some height, but if it isn't supported a server should actually reject the tx even if it does parse the protobuf. I'm also suggesting a generic extension_options field which could have anything.

So I'm just noting that we likely need some generic mechanism to reject unhandled fields/options. This is probably true regardless of which signing method we use even though canonicalization (even with Amino JSON) was previously one approach for doing this.

Two possible way to approach this that could work together are:

  • a generic parser with the raw .proto definitions checks messages for unknown fields and rejects them during signature verification
  • for every processing option found on a tx needs to have a registered ante handler or the tx gets rejected. Maybe there is some way for the handlers to "consume" options and if there are unconsumed options at the end that's an error

An alternative is to reconsider one of the other options (besides 3) as default. But this places added canonicalization complexity on clients (whereas checking messages should be the verifier's responsibility) and also opens up the malleability issue again.

One thing I want to note about protobuf canonicalization that I forgot when writing up these pros and cons is that it causes transaction verifiers to reject messages with unknown fields (as they won't show up in the canonical version).

Supporting unknown fields in this sort of scheme is the raison d'etre of Veriform, as it were. However, I think you can support them as well in your scheme if you have a schema-independent message parser/verifier which ensures messages are in canonical form.

Edit: Just remembered this isn't possible because Protobufs don't have a wire type for nested messages. Alas.

Edit again: Kenton Varda had some suggestions on that same thread regarding how to do Protobuf canonicalization for these purposes, however they feel a bit heuristic IMO.

I believe that's generally the correct behavior for a blockchain - if I set a field which the server doesn't understand, the server should report an error rather than silently ignore.

X.509 (and Veriform) both solve this problem via the inclusion of a "critical" flag on message fields (or extensions, in X.509 parlance). It's possible to add non-critical fields in a backwards compatible way, but unknown fields/extensions flagged as critical are rejected.

X.509 (and Veriform) both solve this problem via the inclusion of a "critical" flag on message fields (or extensions, in X.509 parlance). It's possible to add non-critical fields in a backwards compatible way, but unknown fields/extensions flagged as critical are rejected.

In this scenario, though, there would be no way for a verifier to know if an unknown field were critical in a later version of the protocol (that the client supports but server doesn't). Critical flags would only help if the server possessed the newer schema but couldn't process it.

Yeah, unfortunately a critical flag is difficult to implement in Protobufs. Veriform encodes it directly into the wire type.

The only things I can think of are some sort of Any wrapper on extension fields which also adds a "critical" boolean flag, or a preprocessor/postprocessor which encodes it into the wire type, both of which are ugly solutions.

I believe this extension/critical/processor thing goes back to the fundamentals... does the signature process need to concern itself with the contents of every message, or should that be the responsibility of the message handler?

I believe that the signature handler should not concern itself with the contents of the messages it is checking the signature of-- no more than a mailing envelope cares about any papers it may contain. The envelope is satisfied with a correctly formatted address and valid postage. So should the signature verifier be satisfied with a proper signature and leave the message validation to downstream code.

For verifiers of X.509 signed objects, critical extensions ended up being pretty important.

Without "critical" extensions, it's not possible to safely refine the protocol to add features which provide constraints/caveats on the authority the signed object designates in a backwards-compatible manner.

The alternative to this approach discussed earlier is to fail closed on any unknown field, but that precludes any sort of backwards-compatible schema evolution/extensions, which I think wind up being pretty important.

I believe this extension/critical/processor thing goes back to the fundamentals... does the signature process need to concern itself with the contents of every message, or should that be the responsibility of the message handler?

I believe that the signature handler should not concern itself with the contents of the messages it is checking the signature of-- no more than a mailing envelope cares about any papers it may contain. The envelope is satisfied with a correctly formatted address and valid postage. So should the signature verifier be satisfied with a proper signature and leave the message validation to downstream code.

I agree that separating this from signature processing makes sense, but as @tarcieri is pointing out it does seem to be tightly coupled with encoding choices.

The alternative to this approach discussed earlier is to fail closed on any unknown field, but that precludes any sort of backwards-compatible schema evolution/extensions, which I think wind up being pretty important.

That's a good point and I can't really think of another backwards compatible alternative besides trying to overload Any with a flag or something similar.

I think for unknown fields we either need to make the choice that a) critical extensions are never added as regular fields or b) unknown fields always trigger failure.

One thing I want to reiterate is that this applies in two places: the top-level tx and the sdk.Msgs themselves. I could imagine some sort of overloaded critical Any thing working for the top-level tx, but for sdk.Msgs that seems pretty complex. So I think without having something built into the encoding standard, rejecting unknown fields may be the necessary compromise...

For verifiers of X.509 signed objects, critical extensions ended up being pretty important.

Without "critical" extensions, it's not possible to safely refine the protocol to add features which provide constraints/caveats on the authority the signed object designates in a backwards-compatible manner.

The alternative to this approach discussed earlier is to fail closed on any unknown field, but that precludes any sort of backwards-compatible schema evolution/extensions, which I think wind up being pretty important.

To clarify ... it isn’t that I feel these techniques are invalid or unimportant ... My point is that the signature handler cannot know of every field for every message type nor should it attempt to understand how to parse the message(s) with only partial understanding. To compromise on this point breaks the separations of concerns and yields a strictly less extensible, more complex, and difficult to extend approach.

The assigned message handler for the type on the other hand absolutely should validate the message in depth and throw an exception if there is something it does not understand or cannot parse.

The assigned message handler for the type on the other hand absolutely should validate the message in depth and throw an exception if there is something it does not understand or cannot parse.

But the issue with protobuf is that once a message is parsed you just have a go struct and we don't currently have a mechanism for retrieving unknown fields. They're just discarded. So this rejecting messages with unknown fields needs to happen higher up in the stack at parse time.

Actually @tarcieri, one way we could flag critical fields in protobuf is just with field number ranges. Say 1-127 is critical (what people would choose by default), 128+ is non-critical.

Also if any of you get a chance to do a quick review of #6111 before we finalize it that would be greatly appreciated @tarcieri @iramiller

I think for unknown fields we either need to make the choice that a) critical extensions are never added as regular fields or b) unknown fields always trigger failure.

I'd suggest taking a look at Adam Langley's blog post about extensibility. I think it winds up being pretty important:

https://www.imperialviolet.org/2016/05/16/agility.html

Actually @tarcieri, one way we could flag critical fields in protobuf is just with field number ranges. Say 1-127 is critical (what people would choose by default), 128+ is non-critical.

That sounds great! Although to really make it work, you need a Protobuf implementation that allows you to extract the unknown fields so you can check if any are critical.

I think for unknown fields we either need to make the choice that a) critical extensions are never added as regular fields or b) unknown fields always trigger failure.

I'd suggest taking a look at Adam Langley's blog post about extensibility. I think it winds up being pretty important:

https://www.imperialviolet.org/2016/05/16/agility.html

Thanks for sharing. So I think the big takeaways are make the extension system simple - in protobuf this is basically adding new fields - and allow newer clients to work with older servers - thus the desire to not just reject all messages which clients send with new fields but only ones that are flagged as critical.

Actually @tarcieri, one way we could flag critical fields in protobuf is just with field number ranges. Say 1-127 is critical (what people would choose by default), 128+ is non-critical.

That sounds great! Although to really make it work, you need a Protobuf implementation that allows you to extract the unknown fields so you can check if any are critical.

Yeah, we might need to do something hand-generated that takes the message descriptors and flags unknown critical fields.

I'm not terribly familiar with the Go Protobuf ecosystem, but I believe you can use goproto_unrecognized for this purpose?

I'm not terribly familiar with the Go Protobuf ecosystem, but I believe you can use goproto_unrecognized for this purpose?

Yeah unfortunately it would break other stuff to enable that so it would likely need to be a separate parser pass.

We'll be discussing the topics related to this, and more specifically any outstanding blocking issues with ADR020 tomorrow on our bi-weekly architecture review call.

https://hackmd.io/UcQfA-JRQjG1zfKKWxH1gw

Zoom link in the HackMD if anyone from this thread would like to join.

does the signature process need to concern itself with the contents of every message, or should that be the responsibility of the message handler?

Looking at the protobuf document, this is no issue: every message is stored in an Any field, and Anys are embedded as raw data into the containing document. So the signature handler only needs to understand the outer envelope (1-2 small layers). The rest is opaque data for the verifier, that can be passed to the type-specific message handler.

But the issue with protobuf is that once a message is parsed you just have a go struct and we don't currently have a mechanism for retrieving unknown fields. They're just discarded. So this rejecting messages with unknown fields needs to happen higher up in the stack at parse time.

You need to parsing steps anyways (either explicitely or hidden by gogoproto): (1) the outer document including the Any field and (2) the message.

So the extra fields we are talking about is when a client has a different message schema for the same type URL with extra fields? Well, then I guess the chain is the source of trueth and the client can add whatever data they want to a given message. But this is a matter of message type upgradability, not signature verification.

Yes, this is about message upgradeability, not signature verification per se. Although the canonicalization approach to signature verification prevents upgradeability (in the direction of newer clients interacting with older servers).

So summary from our architecture review call today (https://hackmd.io/UcQfA-JRQjG1zfKKWxH1gw?view), which focused on ADR020 (#6111)

We had consensus on the approach laid out in this PR, with a few minor additions:

  • Sequence number should never be allowed to be 0 (the first time a user submits a transaction), but rather should start with 1
  • We will proceed with the proposal from @aaronc in conversation with @tarcieri to reserve fields (1-1024) for critical fields, allowing non-critical fields above 1024 field number

Oustanding question:

  • We still had yet to come to a conclusive decision as to whether the PublicKey in SignerInfo uses oneOf vs Any to specify the type of publickey algorithm used.

EDIT: It is expected that the resolution of this PublicKey question is not a blocker for ADR020, but will be discussed & resolved seperately.

@aaronc Can you lay out more details for this outstanding point here?

There is a pretty big discussion that has been unfolding related to #6111 and #6174 in those PRs and now on discord. I think it has gotten to the point where it deserves its own list of alternatives with pros and cons. I will make my own attempt below.

The general context of this discussion is the UX of a multi-signer transaction. I will frame the various alternatives under three broad umbrellas - (1) Closed signers, (2) Open signers, (3) Open then closed signers.

(1) Closed signers (the current Cosmos SDK model)

Definition: the list of signers is fixed by the content of the messages. Extra signers are not allowed. Every signer must sign over the list of other signers

Multi-Signer UX:

  1. Tx composer proposes the transactions to possible signers and asks who will sign
  2. Tx composer composes messages with the list of signers that agreed to sign
  3. All signers that agreed to sign must sign
  4. Tx can be broadcast once the final signer signs

Pros:

  • Limiting the list of signers prevents vulnerabilities that can arise when malicious signers can add their signature to cause out of gas

Cons:

  • Limiting the list of signers means that the tx composer first needs to ask who will sign (phase 1) and then ask that same group of people to sign (phase 2)

Variation (a): Don't ask for pubkey and signing mode in step 1

This is effectively the current SDK model which doesn't include signing modes

Compared to variation (b):

  • Pro: Tx composer doesn't need to ask for pubkeys and signing mode in step 1, just account addresses
  • Con: Tx composer needs to over-estimate the gas limit because different pubkeys and signing modes may have different gas (possibly orders of magnitude difference for nested multisigs)
  • Con: an attacker could manipulate mode and pubkey flags if vulnerabilities existed there (this could maybe be mitigated by making everyone sign their own pubkey's and mode flags)

Variation (b): Ask for pubkey and signing mode in step 2

This is model proposed in #6111

Compared to variation (a):

  • Pro: prevents attacker manipulation of mode and pubkey flags if vulnerabilities existed at that layer
  • Pro: Tx composer can accurately estimate gas limit because all the pubkey types and signing modes are known
  • Con: Signers need to share their pubkey in step 1 and also fix the signing mode they will use in step 3

(2) Open signers (the Weave model)

Definition: extra signers not specified by messages are allowed

This is Weave's model. (@webmaster128 if I get something wrong here please correct me)

Multi-Signer UX:

  1. Tx composer composes messages and shares with list of potential signers
  2. Any signers can sign and broadcast immediately, as long as there are enough required signers the tx will succeed

Pros:

  • Much simpler workflow for say a 3 of 5 multisig - the tx composer does not need to know which 3 out of 5 signers will sign before composing the message
    Cons:
  • Malicious actors can intercept tx's and add extra signers that will cause out of gas errors and the fee payer will still pay the fee. Basically a zero cost transaction malleability attack

Variation (a): Original Weave model - no signature limit

As @webmaster128 shared, this had the following cons:

Before we discovered and fixed a bunch of attacks late 2019 you could do the following:

  1. Spam a ridiculous amount of unnecessary data as part of a signatures (since protobuf allows unnecessary fields)
  2. Add an arbitrary amount of unneeded signatures for no reason
  3. Change signers[0] which was used as a default value in some places, since signatures are unordered. This allowed someone else to turn any signer into the fee payer, among other things.

Variation (b): Fixed Weave model

According to @webmaster128:

  1. was a combination of carelessness and a type system that has no set type. The signers were originally designed as a set but then interpreted as an ordered list in some situations. This was solved by never using the order of signatures in the application.

A Cosmos SDK-style tx size limit did not exist. 1. and 2. were stopped by implementing a tx fee based on byte size. Now you can spam as much as you want, if you pay for it. Fees are calculated based on the expected number of signatures.

No gas system exists in Weave. But the tx size fee automatically puts a limit on the number of signatures. If someone adds too many signatures, the tx is rejected since too big. Otherwise it will pass.

I believe this is the actual issue: iov-one/weave#1091

While this solves some issues, it still appears to be exploitable (??), i.e. if the message was intercepted an extra signer could still add themselves and screw things up for the fee payer, is that true??

(3) Open then closed signers

This is the approach proposed in #6174

Multi-Signer UX:

  1. Tx composer composes messages without fixing the list of required signers and shares with a group of potential signers
  2. Any signers can sign whether or not they are required/expected
  3. As soon as there are enough signers, any signer can become the fee payer and sign and broadcast the final message that fixes the list of signers and the fee

Pros:

  • allows a UX very similar to (2) without the malleability attack surface
  • allows the fee payer to accurately estimate gas limit
  • allows the fee payer to know that they are paying for the verifications they want and not some random attackers signatures
    Cons:
  • compared to (2), requires the fee payer to be the last signer

Apologies for this being such a long post. I really want to get to the bottom of this and agree on an approach and move forward.

In my personal opinion, approach (3) seems to have the most benefits and least tradeoffs for multi-signer transactions. It allows the benefits of (2) without the downsides.

Any variation of (1) would be the status quo and acceptable. I don't see a huge difference between (1)(a) and (1)(b) - it's still the same number of phases and requires fixing signers before signing. I would prefer the guarantees of (b) but at this point whatever. Neither seems to actually give a very good multi-signer UX which I thought was the point of this.

That's why I recommend approach (3) #6174 - good security guarantees + good multi-signer UX.

I feel like 3 is the correct direction as well.

Thank you @aaronc for the great summary. Here are a few additions/remarks and then separated my person vote.

(1)
Additional Pros:

  • The signature aggregation is fully parallel (i.e. signatures can be collected in any order since they don't depend on each other)

Regarding Con:

Limiting the list of signers means that the tx composer first needs to ask who will sign (phase 1) and then ask that same group of people to sign (phase 2)

The point does not apply to a multisig-pubkey, which is a single signature that can be satisfied by any signatures of the group that combined make up a valid multi-pubkey signature. For the required signers of the individual messages, it the required signers are deterministic already.

(1a)
Regarding Pro:

Tx composer doesn't need to ask for pubkeys and signing mode in step 1, just account addresses

The tx composer does not even need to ask for the addresses. They exist automatically as part of the messages.

(2)
Comment: Right, using the Weave model in Cosmos SDK would be exploidable since an external party can add unnecessary signatures causing the gas consumption to exceed the limit. If the manipulated tx is processed before the original tx, the tx fails but the gas cost is spent. In Weave this issue does not apply since there is no gas system.


There is a significat difference between (1a) and (1b): The list of required signer addresses is available for free locally in the messages. Needing to know the public key for each required signer address can only be automated if all pubkeys are stored on-chain, which is not given for new accounts. The communication overhead of needing to know the signing modes of each signer is relevant as it increases the roundtrips of human communication. Additionally it violates separation of concerns.

My vote goes to (1a) as it

  • is simple to implement
  • is close to the current SDK
  • no attack vector known as long as all signing modes enabled on chain are secure
  • is most convenient for humans since no dependencies between signers exist

ps.: there is a variant of (3), let's call it (4) which only allows a single top level signature and keeps all other signatures at a lower level. It comes with the same two-phase signing process as (3) but simplifies everything. Most transactions only need a single top-level signature (which can be one of a multi-pubkey groups) and the few transactions multi multiple inner signatures can be processed in two steps. It is a radical change. Let me know if someone is interested to learn more.

is most convenient for humans since no dependencies between signers exist

But that's not true because they still need to agree upon the list of addresses to include in the messages. If you wanted to coordinate 3 signers out of 5, you would still need to ask all 5, then get an ACK from 3 of them and then compose the messages and then have them sign it. I acknowledge that the weave model made this simpler and I think (3) gets pretty close.

Based on discussions on discord, I have updated my original PR #6111 with the following:

  • made a small modification to my initial approach where AuthInfo is split out from TxBody to gracefully enable (3) and/or (1)(a) in the future if so desired
  • approaches (3) as well as (1)(a) mentioned as possible future improvements
  • handling of signing modes for the SDK's nested multisig pubkeys added
  • unknown field handling added

I agree with @alexanderbez that we should try to keep our initial implementation as simple and as secure as possible. Even if we did come to consensus that (3) provides a better UX, it is beyond the scope of our current work to address this. But I think it's good to document the alternatives and then see what users ask for in practice. (1)(b) seems to be the simplest to implement with the most security guarantees and wouldn't preclude (3) and/or (1)(a) from existing as alternate signing modes in the future.

What is left to get the work started? What needs to be merged?

What is left to get the work started? What needs to be merged?

#6111, then maybe an init PR with the proto file and then implementing the signing logic.

is most convenient for humans since no dependencies between signers exist

But that's not true because they still need to agree upon the list of addresses to include in the messages. If you wanted to coordinate 3 signers out of 5, you would still need to ask all 5, then get an ACK from 3 of them and then compose the messages and then have them sign it.

There are two types of multiple signatures currently discussed:

  1. The list of requires messages signers that is calculated from the GetSigners result of each message.
  2. A multisig account type that has a separate group account and thus a separate address. It has a special type of pubkey that encodes the participants and threshold.

The later results in a single (nested) top-level signature, i.e. is one signer no matter which group participants signed. So no coordination required on which participants will sign.

The first one is not suited to implement n/m multisig at all because the nature of the GetSigners() method makes flexible signers impossible. It is just different people who authorize different things. This is needed to do e.g. atomic trades where Alice signs a token send message that sends tokens to Bob, Bob signs a smart contract execution that transfers asset X to Carl and Carl signs a 3 message sending tokens to Alice, Bob and a charity. This gives us 5 messages signed by 3 pre-specified entities. The state of the current SDK is that for each of those parties you only need to know the addresses, which are part of the messages.

Closing this now, as this issue was targeting an update of ADR020.

#6213 will be used for tracking the actual implementation of the updated ADR as described in #6111