powdr-labs/powdr

Witness generation for multiplicities in LogUp / bus argument

Opened this issue · 8 comments

We currently do not generate the multiplicities column when using our LogUp-based PIL lookup implementation. In fact, the test just uses a fixed column.

We should:

  • Identify the multiplicities column
    • #1378 is meant to introduce an explicit annotation for this, but we could also do some basic pattern-matching for now
  • When processing a lookup, we should increment the multiplicity of the looked-up row and return the column as part of the first-phase witness generation.
  • By implementing this, do we still need to keep the same test (assigning col fixed m or col witness m)? Does the corresponding constraints change if we just have col witness m? If not, how can we return the "multiplicities" as a witness column?
  • Should we implement this by adding a machine detection part as in the double-sorted witness machine? (like you mentioned in #1378, by identifying, do you mean for example by the column name like "multiplicity")?

Right, I think the constraints should just how they are, but the multiplicity should turn into a fixed column. Having it as a fixed column in the test was only a hack because we don't have witness generation yet.

do you mean for example by the column name like "multiplicity"
Yes, exactly, as long as we don't have #1378.

I think, for now, we would have both, the built-in lookup and the "library" version, which adds a stage-0 multiplicity column and a few stage-1 column. If the lookup spawns multiple namespaces (as they typically do, because they connect two machines), the CompositeBackend will remove the native lookup anyway, so it basically already acts as an annotation in witgen.

A little bit background on "machine detection": For witgen, we currently group columns into "machines" by looking at the connected components after removing lookups and permutation (so if two columns appear together in a polynomial identity, they are in the same machine).

Actually, now that I think about it, I think a good first step would be to add a lookup-via-bus, analogous to the permutation-via-bus that you added before, because:

  • That makes the machine detection simpler, because we wouldn't have constraints that depend on both the LHS and RHS of a lookup.
  • It's what we'll actually for VADCOP, so maybe we don't even have to support std::protocols::lookup as a first step.

And then we have to detect the multiplicity and increment the counters. I think it makes sense to start with the FixedLookup machine (I think this might be the only one where we really use lookups in practice).

Right, I think the constraints should just how they are, but the multiplicity should turn into a fixed column. Having it as a fixed column in the test was only a hack because we don't have witness generation yet.

I think you mean that "multiplicity" should turn into a witness column, not a fixed column right?

For the machine detection part, which part of the code do you look at the connected components, and what do you mean by the connected components? Is there any example for it? And, which part of the code you remove lookups and permutation?

If we don't have the constraints dependent on LHS and RHS, how do we make the machine detection simpler in this case?

Should we detect the multiplicity in the generated .pil file somehow?

I think you mean that "multiplicity" should turn into a witness column, not a fixed column right?

Yes!

For the machine detection part, which part of the code do you look at the connected components, and what do you mean by the connected components? Is there any example for it?

In witgen, there is this machine extractor: It takes as input all the identities and groups them into machines. By connected components I just mean that two witnesses belong to the same machine if they appear together in a polynomial identity, or together on the LHS or RHS of a lookup / permutation.

If we don't have the constraints dependent on LHS and RHS, how do we make the machine detection simpler in this case?

Take test_data/std/permutation_via_challenges.asm as an example: The LHS is [a1, a2], the RHS is [b1, b2], so if this was a "native" lookup (first_four $ [a1, a2] in (1 - first_four) [b1, b2]), the above algorithm would detect two machines: One with witness [a1, a2] and one with [b1, b2], because they also don't depend together in any polynomial identity. This is the behavior we want.

But std::protocol::permutation generates this constraint:

((1 - main.first_four) * (std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * main.b1 + main.b2) - 1) + 1) * main.z' = (main.first_four * (std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * main.a1 + main.a2) - 1) + 1) * main.z;

So I believe the machine extractor as it works now would put everything in one machine.

In contrast, test_data/std/bus_permutation_via_challenges.asm generates these two constraints:

(std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * (std::prelude::challenge(0, 1) + main.a1') + main.a2')) * (main.z' - main.z * (1 - main.is_first')) = main.first_four';
(std::prelude::challenge(0, 2) - (std::prelude::challenge(0, 1) * (std::prelude::challenge(0, 1) + main.b1') + main.b2')) * (main.u' - main.u * (1 - main.is_first')) = 21888242871839275222246405745257275088548364400416034343698204186575808495616 * (1 - main.first_four');

The first one is only concerned with the LHS, the second one only with the RHS, so the machine extraction algorithm still works!

And, which part of the code you remove lookups and permutation?

That's unrelated to witgen I guess, but any time we prove using the "composite" backend (e.g. --prove-with halo2-mock-composite on the CLI), it creates a proof for each namespace (which corresponds to machines in ASM) and ignores any lookups and permutations between them. This is the code that does this.

Of course, in the end, we should not have a native lookup / permutation in the first place. Instead, there should be just an annotation (#1378), so it's explicit that this only tells witgen how machines are connected and does not enforce any constraint. Then a function like std::protocols::permutation_via_bus::permutation can generate both the constraint and the annotation.

I'm not sure how/where to detect multiplicities, and where to calculate after the detection.

You mentioned that it makes to start with the FixedLookup machine. So, what the FixedLookup::new function should return in this case?

What I understand from this issue:

  • Need to find a place to group lookup columns into somewhere (not sure where to group these)
  • Need to find a place to group the witness columns (fixed columns for testing purposes cause) which we'll look for
  • Need to detect the multiplicity column in a place (not sure how/where)

And, one confusing point for me is in which phase/stage I should calculate how many times the looked-up values occur in the lookup table. Looks like it should be a better idea to calculate the multiplicities in stage-0 and do the witness generation in stage-1 maybe?

Yeah, it's not super clear how that best fits into the current codebase, let's have a call tomorrow :)

And, one confusing point for me is in which phase/stage I should calculate how many times the looked-up values occur in the lookup table. Looks like it should be a better idea to calculate the multiplicities in stage-0 and do the witness generation in stage-1 maybe?

I think the multiplicities can (and probably should, for security?) go into a stage-0 column, as their value does not depend on any challenge.

  • Should I write the detection by implementing a new machine or inside the FixedLookup? (Cause I think fixed lookup is just for fixed columns, isn't it?)
  • If I implement a new machine and keep track the multiplicities inside the process_plookup/process_plookup_internal, how the witness generator uses the process_plookup function for this specific machine?

I think it should still stay in FixedLookup. The lookup is still for fixed columns only, it's just the multiplicity witness column that is part of the lookup argument.

I haven't thought this through 100%, but I think I would go in small steps like:

  • Adding a PR that just implements the counting and logs the final multiplicities in the trace log, so we can manually verify.
  • Doing a very simple detection of the multiplicities column (as it will be explicit with the annotation anyway). Like: For all lookups, if there is an identity referencing all RHS columns + a challenge, identify the extra witness column. That's the associated multiplicities column. Or even simpler, like matching for the name ;)