cfrg/draft-irtf-cfrg-vdaf

Define more common types

Closed this issue · 13 comments

Some of the commonly used types are not explicitly defined using TLS syntax, or mentioned in "Conventions and Definitions". For example:

  • prep share
  • prep msg
  • input share
  • public share

while similar ones like OutShare, Measurement is.

other types like Prep is treated like a type but it's content not specified when it's mentioned, but only at a later time, for e.g https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/draft-irtf-cfrg-vdaf.md#communication-patterns-for-preparation-vdaf-prep-comm.

Typically you won't see cryptographic algorithms described in CFRG documents described in TLS syntax --- many predate RFC 8446 --- but I agree this would add clarity to the draft. As a bonus, it would allow us to remove some auxiliary functions that deal with serialization.

I think the only question is whether TLS syntax is expressive enough. I think we should investigate.

I agree, if TLS is not mandatory or fit for purpose, perhaps a more advanced general purpose language will make the description easier to understand.

Related to this, in the generic FLP section (https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/draft-irtf-cfrg-vdaf.md#validity-circuits-flp-generic-valid), many types should have been defined and used, but instead a generic Vec[...] is used:

Valid.encode(measurement: Measurement) -> Vec[Field] returns a vector of length INPUT_LEN representing a measurement.

... many types should have been defined and used, but instead a generic Vec[...] is used

Which types do you think are missing?

I've started digging this but I'm confused about what the actual ask is here. Currently:

  1. If a value must be written to the wire in a protocol that uses VDAF, then our convention is to treat it as Bytes. This includes the prep share, prep message, input share, and public share.
  2. Otherwise if a value is not necessarily written to the wire, then our convention is to define an explicit type for it so that we can use it. This includes the prep state, output share, and aggregate result.

It sounds to me that folks would favor replacing the Bytes with an explicit type for the first category, i.e., prep share, prep message, input share, and public share?

sounds to me that folks would favor replacing the Bytes with an explicit type for the first category, i.e., prep share, prep message, input share, and public share?

For me, yes.

I think my ask is similarly to #58. Defining explicit type (and specify encoding if necessary) makes the text more readable and also more flexible.

My this this should be a separate issue, but should the Prep type be renamed to PrepState? since that's what it meant to represent: https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/draft-irtf-cfrg-vdaf.md#definition-of-vdafs-vdaf

Prep | State of each Aggregator during Preparation ({{sec-vdaf-prepare}})

the ping ping topology also introduces a State type, which looks like a enum with associated data of type Prep, but it's not fully defined in the text:
https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/draft-irtf-cfrg-vdaf.md#ping-pong-topology-only-two-aggregators

def ping_pong_req(Vdaf,
                  agg_param: Vdaf.AggParam,
                  state: State,
                  inbound: Optional[Message],
                  ) -> (State, Optional[Message]):

Renaming Prep to PrepState is a good idea 👍

I'd like to drill down on the question about explicit types further. What would be the purpose? To make the encoding format explicit, or to be consistent with the other values?

The question in #58 is whether this document should be prescriptive about encoding at all. (We currently are.)

I'd like to drill down on the question about explicit types further. What would be the purpose? To make the encoding format explicit, or to be consistent with the other values?

the latter.

I'd like to drill down on the question about explicit types further. What would be the purpose? To make the encoding format explicit, or to be consistent with the other values?

the latter.

Got it. Let me drill down one step further, and let's focus on input shares: In your opinion, should this document (1) define a type for the input share, (2) define the wire format of an input share, or (3) both? If (3), what would be the benefit of also defining a type, if we've already specified the wire format?

Here's a concrete proposal.

First, change the type of the input share from bytes to Vdaf.InputShare. For Prio3 it would be something like

tuple[
  Union[bytes[Prio3.Prg.SEED_SIZE], Vec[Prio3.Flp.Field[Prio3.Flp.INPUT_LEN]]],
  Union[bytes[Prio3.Prg.SEED_SIZE], Vec[Prio3.Flp.Field[Prio3.Flp.PROOF_LEN]]],
  Optional[bytes[Prio3.Prg.SEED_SIZE]],
]

Second, in a separate section, perhaps towards the bottom, specify the wire format of the input share. For Prio3 it would be something like (in TLS syntax):

struct {
  select (is_leader) {
     case true:
       Prio3.Flp.Field meas_share[Prio3.Flp.INPUT_LEN];
       Prio3.Flp.Field proof_share[Prio3.Flp.PROOF_LEN];
     default:
        uint8t k_meas_share[Prio3.Prg.SEED_SIZE];
        uint8t k_proof_share[Prio3.Prg.SEED_SIZE];
  };
  select (use_joint_rand) {
    case true: uint8 k_blind[Prio3.Prg.SEED_SIZE];
    default: Empty; 
  };
} Prio3InputShare;

WDYT @wangshan? I'd also be curious to here what @branlwyd makes of this plan.

I like the idea to specify an explicit type for the preparation share (rather than using a generic type like Bytes, to represent the serialized data) -- this will make understanding the specification easier, IMO.

I don't have as strong an opinion on the wire format change, but I'd understand TLS syntax (or another "standard" structured-data specification format) more easily than the current ad-hoc serialization.

But is there a reason to use both the tuple format & the TLS-syntax Prio3InputShare format? Naively, I'd specify only the TLS-syntax format, and write the relevant algorithms to take/return this structure, then say to do standard TLS-syntax serialization for over-the-wire transmission of values. This seems simpler to me than specifying one format for use in the algorithms & one format for the wire (and I suppose defining how to translate between them).

I don't see why we couldn't do that, but we'd have to invent a bit of glue between TLS-syntax and python (@wangshan suggested this). For example, we'd need to make the following code meaningful:

input_shares = [Prio3InputShare(leader_meas_share, leader_proof_share, k_leader_blind)]
for j in range(Prio3.SHARES - 1):
   input_shares.append(Prio3InputShare(k_helper_meas_share[j], k_helepr_proof_sahre[j], k_helper_blind[j]))

FWIW we kind of already do this with the ping-pong API.

One advantage of keeping things split is that we don't have to invent anything new. In particular the main section of the document would just speak in terms of python, then we'd have a separate section that would deal with the (optional, see #58) serialization bits.

I've put a PR up to address the bulk of this issue, PTAL. The remaining parts are documented as TODOs.