Missing Spillage for Lookups / Range Constraints

Question

Missing Spillage for Lookups / Range Constraints

DavePearce opened this issue 6 months ago · 11 comments

As far as I can tell, this is a bug: spillage is not being correctly computed for lookups involving expressions. Specifically:

(module m1)
(defcolumns X)
(deflookup test (m2.A) ((shift (~ X) -2)))

(module m2)
(defcolumns A)

With corset debug -s we get this:

=== Spilling ===
        m2:    0
        m1:    0

This feels wrong to me, since it is not consistent with the spillage computed for e.g. this:

(module m1)
(defcolumns X)
(defconstraint test () (vanishes! (shift (~ X) -2)))

UPDATE: This may also extend to range constraints.

Answer 1 · 2024-05-20T15:36:16.000Z

I'm not sure this is wrong, because you could remove the shift in the lookup, it'll be exactly the same lookup. We would get a spillage of 2 if you have let's say a lookup (m1.A m1.B) into (m2.A (shift m2.B 2)) : what's important is the relative shifting between target columns (or source). You could add a 10 shift to all source columns, it should be exactly the same.

Answer 2 · 2024-05-20T20:58:47.000Z

You could add a 10 shift to all source columns, it should be exactly the same

I'm not really following you here though. In my example, its the combination of a normalisation ~ and a shift which is the issue. Anyways, if there really is a problem here, then I should be able to craft a minimal example which illustrates it. If I can't ... then there's no problem :)

Answer 3 · 2024-05-21T08:21:09.000Z

I'm not really following you here though

What I mean is that a lookup

(deflookup module1-into-moduleb 
(m1.A m1.B)
(shift m2.A n)  (shift M2.B n)))

is imho the same whatever the value of n.

its the combination of a normalisation ~

What does it change that there is a normalisation for the spilling ?

Answer 4 · 2024-05-21T22:37:56.000Z

What does it change that there is a normalisation for the spilling ?

Well, spillage is necessary to handle cases where padding can break automatically generated constraints. Normlisation is one example where this can arise, but not the only one.

Let's take an example:

(defcolumns X)
;; Enforce (X[k] = 0) ∨ (X[k-1] = 1)
(defconstraint test (* X (- 1 (shift X -1)))

Some valid traces for this constraint are:

{ "X": [] }
{ "X": [0] }
{ "X": [1] }

Now, the last trace is a problem. If we pad with 0 then the trace becomes:

{ "X": [0,1] }

This is now an invalid trace and will be rejected. However, in fact, we would consider this to be a mistake by the author of the constraint. They have failed to account for potential padding by using e.g. a guard such as :guard STAMP. In short, it is their responsibility to ensure their constraints work correctly in the face of arbitrary amounts of padding.

Now, the real problem with spillage arises for constraints that are added by the corset tool. They cannot be controlled by the user e.g. with a guard (as works above). Instead, the corset tool must ensure constraints it creates will always hold no matter how much padding is applied. This is what spillage is about.

Normalisation is one example where corset adds hidden constraints. In this case, it adds constraints that a column is the multiplicative inverse of another. Here's an example:

(defcolumns X Y)
;; Enforce (X[k] = 0) ∨ (X[k+1] ≠ Y[k])
(defconstraint test () (* X (- 1 (~ (- (shift X 1) Y))))

Corset expands the above automatically to something like this:

(defcolumns X invE)
;; Enforce (X[k] = 0) ∨ (X[k+1] ≠ Y[k])
(defconstraint test () (* X (- 1 invE))
;; [ADDED] Enforce (e⁻¹ ≠ 0) ⇒ (1 = e·e⁻¹) where e = X[k+1] - Y[k]
(defconstraint inv_1 () (* invE (eq! 1 (- (shift X 1) Y))))
;; [ADDED] Enforce (e ≠ 0) ⇒ (1 = e·e⁻¹) where e = X[k+1] - Y[k]
(defconstraint inv_2 () (* X (eq! 1 (- (shift X 1) Y))))

Here, invE is the new column introduced by corset holding the multiplicative inverse of the expression X[k+1] - Y[k]. Now some valid traces for these constraints are (where e.g. 3⁻¹ is the multiplicative inverse of 3):

{ "X": [], "Y": [], "invE": [] }
{ "X": [1], "Y": [2], "invE": [0] }
{ "X": [2,4,1], "Y": [1,1,2], "invE": [3⁻¹, 0, ?0] }

Here, ?0 indicates a don't care value which could be anything (i.e. it is unconstrained). By default we'll just assume corset puts in 0.

Now, finally, we come to the issue. Suppose we pad the last trace with one row of zeros, to give the following (where ; separates padding rows from the rest):

{ "X": [0; 2,4,1], "Y": [0; 1,1,2], "invE": [0; 3⁻¹,0,?0] }

This trace will be rejected by the constraint inv_2. Specifically, invE[0] holds 0 which is not the multiplicative inverse of X[1] - Y[0]. Instead, it should hold 2⁻¹.

Oh boy, this example is long (sorry). The challenge for corset is to work out the right values for invE. Let's start from our corrected trace now:

{ "X": [0; 2,4,1], "Y": [0; 1,1,2], "invE": [2⁻¹; 3⁻¹,0,?0] }

And, suppose we add another row of padding. This is what we need:

{ "X": [0,0; 2,4,1], "Y": [0,0; 1,1,2], "invE": [0,2⁻¹; 3⁻¹,0,?0] }

So, the point here is that the padding values needed for invE are not constant. In this case, they are mostly constant. That is, they are of the form [2⁻¹; ...], [0,2⁻¹; ...], [0,0,2⁻¹; ...], etc. In general, the number of padding rows that could require different values is determined by the maximum positive shift n. Whilst the prover allows a single padding value to be used for any given column, it does not support multiple changing values. Instead, corset adds these padding values itself (which is called spillage).

Phew. Sorry. Hopefully that helps.

@delehef Be great if you could check my explanation here actually makes sense.

Answer 5 · 2024-05-21T22:39:08.000Z

@delehef Also, my argument above implies my PR to optimise spillage is wrong. I've done it the wrong way around!!! Agree?

Answer 6 · 2024-05-21T22:48:10.000Z

@letypequividelespoubelles Apologies but my explanation above doesn't explain why this matters for lookups. The reason is that looking up based on an expression introduces a new column in a similar way as for invE above. So, roughly I guess something like this:

(module m1)
(defcolumns X)
(deflookup test (m2.A) ((+ X 1)))

Is translated into this:

(module m1)
(defcolumns X Xp1)
(deflookup test (m2.A) (Xp1))
;; [ADDED] Enforce that `Xp1 = X+1`
(defconstraint X_plus_1 () (eq! Xp1 (+ X 1)))

That's still not enough to be relevant for spillage. To break things, we need e.g. a lookup involving a normalisation expression to get us into a similar situation as we had above where corset has to pad certain specific values to meet the constraints inv_1 and inv_2. Currently, corset will not do this and that means traces can be rejected after padding rows are added.

Answer 7 · 2024-05-21T23:11:12.000Z

Be great if you could check my explanation here actually makes sense.

You nailed it.

I've done it the wrong way around!!! Agree?

My brain is not working 100% now, but I think it's still the past padding; the ungabunga solution is to write your example and see which one makes the constraint pass ^^

Answer 8 · 2024-05-22T01:39:51.000Z

@delehef Thanks!

My brain is not working 100% now, but I think it's still the past padding; the ungabunga solution is to write your example and see which one makes the constraint pass ^^

Yeah, I think I'm gonna have to do that. Otherwise, I'm not going to be confident.

Answer 9 · 2024-05-23T02:21:49.000Z

So, I've realised that writing a test case is not so easy.

(Padding) I need to be able to force at least two rows of padding to really illustrate the problem.
(Traces)And, we need to get Corset to check a raw trace without recomputing any values.

It doesn't seem like the CLI supports either of these options.

Answer 10 · 2024-05-23T21:38:31.000Z

I think the solution is to allow the corset check command to insert a specified amount of padding (e.g. --padding=16). Then I can properly test this.

Answer 11 · 2024-05-24T08:32:26.000Z

ok, understood, thanks for the explanation, I better understand now.

The case I have in mind where we would like a min nb of padding because of a lookup, without new column computed :

(deflookup 
(m1.C m1.C)
(m2.C (shift m2.C -n))

here we would like to add n padding rows to column C of module2 (with value of 0). Which is different than the definition of spillage you gave "nb of non-zeros needed in the padding before constant 0". But in terms of line counting, it's exactly the same.

The only use case I see currently is for the lookup HUB into SHAKIRA (and n=1 in this case). We dealt with it by trivially add one 0 in the trace as a first row, which gives 2 padding rows with the general padding row added for all modules.