viperproject/silicon

Unrolling loops blows up verification time

Closed this issue · 6 comments

sakehl commented

TL;DR: When trying to verify code that originated from an unrolled loop, verification time blows up. Is there anyway to verify this without blowing up?

Context

For my research, I am verifying code written in C, passed to VerCors, and which in turn uses Viper/silicon The code contains unrolled loops.

The C programs look similar to:

// Unroll inner dimension N times
#define N 4 
void simple(int[] inp, int[] f){
  for(int y=0; y<100; y++){
    for(int xo=0;xo<10;y++){
      f[(y*10 + xo)*N + 0] = inp[(y*10 + xo)*N + 0] +  inp[((y+1)*10 + xo)*N + 0];
      f[(y*10 + xo)*N + 0] = inp[(y*10 + xo)*N + 1] +  inp[((y+1)*10 + xo)*N + 1];
      f[(y*10 + xo)*N + 0] = inp[(y*10 + xo)*N + 2] +  inp[((y+1)*10 + xo)*N + 2];
      f[(y*10 + xo)*N + 0] = inp[(y*10 + xo)*N + 3] +  inp[((y+1)*10 + xo)*N + 3];
    }
  }
}

where the inner loop was unrolled 4 times.

Eventually I want to verify this condition.

forall x,y; 0<=x && x<10*N && 0<=y && y<100 ==> f[y*10*N + x] == inp[y*10*N + x] + inp[(y+1)*10*N + x] 

Note: this program is artificial, the actual code contains many for-loops, which are unrolled several time each. The behaviour is similar though.

Viper Code

Translated to Viper this looks like below, where I also added annotations that are needed to verify this program.

Viper code
import <decreases/int.vpr>

domain array  {
  
  function array_loc(a: array, i: Int): Ref 
  
  function alen(a: array): Int 
  
  function loc_inv_1(loc: Ref): array 
  
  function loc_inv_2(loc: Ref): Int 
  
  axiom {
    (forall a: array, i: Int ::
      { array_loc(a, i) }
      loc_inv_1(array_loc(a, i)) == a && loc_inv_2(array_loc(a, i)) == i)
  }
  
  axiom {
    (forall a: array :: { alen(a) } alen(a) >= 0)
  }
}

field int: Int

function aloc(a: array, i: Int): Ref
  requires 0 <= i
  requires i < alen(a)
  decreases 
  ensures loc_inv_1(result) == a
  ensures loc_inv_2(result) == i
{
  array_loc(a, i)
}

method simple(tid: Int, inp: array, f: array)
  requires inp != f
  requires alen(inp) == 10 * 4 * 101
  requires alen(f) == 10 * 4 * 100
  requires (forall i_j: Int ::
      { aloc(inp, i_j) }
      0 <= i_j && i_j < 10 * 4 * 101 ==>
      acc(aloc(inp, i_j).int, wildcard))
  requires (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 4 * 100 ==> acc(aloc(f, i_j).int, 1 * write))
  ensures inp != f
  ensures alen(inp) == 10 * 4 * 101
  ensures alen(f) == 10 * 4 * 100
  ensures (forall i_j: Int ::
      { aloc(inp, i_j) }
      0 <= i_j && i_j < 10 * 4 * 101 ==>
      acc(aloc(inp, i_j).int, wildcard))
  ensures (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 4 * 100 ==> acc(aloc(f, i_j).int, 1 * write))
  ensures (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 4 * 10 * 100 ==>
      aloc(f, i_j).int ==
      aloc(inp, i_j).int + aloc(inp, i_j + 10 * 4).int)
  decreases
{
  var y: Int
  var xo: Int
  y := 0
  while (y < 100)
    invariant 0 <= y && y <= 100
    decreases 100 - y
    invariant (forall i_j: Int ::
        { aloc(inp, i_j) }
        0 <= i_j && i_j < 10 * 4 * 101 ==>
        acc(aloc(inp, i_j).int, wildcard))
    invariant (forall i_j: Int ::
        { aloc(f, i_j) }
        0 <= i_j && i_j < 10 * 4 * 100 ==>
        acc(aloc(f, i_j).int, 1 * write))
    invariant (forall xif_xf_yf: Int ::
        { aloc(f, xif_xf_yf) }
        0 <= xif_xf_yf && xif_xf_yf < 4 * 10 * y ==>
        aloc(f, xif_xf_yf).int ==
        aloc(inp, xif_xf_yf).int +
        aloc(inp, xif_xf_yf + 10 * 4).int) 
  {
    xo := 0
    while (xo < 10)
      invariant 0 <= xo && xo <= 10
      decreases 10 - xo
      invariant (forall i_j: Int ::
          { aloc(inp, i_j) }
          0 <= i_j && i_j < 10 * 4 * 101 ==>
          acc(aloc(inp, i_j).int, wildcard))
      invariant (forall xif_xf: Int ::
          { aloc(f, xif_xf) }
          0 <= xif_xf - y * 10 * 4 && xif_xf - y * 10 * 4 < 10 * 4 ==>
          acc(aloc(f, xif_xf).int, 1 * write))
      invariant (forall xif_xf: Int ::
          { aloc(f, xif_xf) }
          0 <= xif_xf - y * 10 * 4 && xif_xf - y * 10 * 4 < xo * 4 ==>
          aloc(f, xif_xf).int ==
          aloc(inp, xif_xf).int +
          aloc(inp, xif_xf + 10 * 4).int) 
    {
      aloc(f, (y * 10 + xo) * 4 + 0).int := aloc(inp, (y * 10 + xo) * 4 + 0).int + aloc(inp, (y * 10 + xo) * 4 + 10 * 4 + 0).int
      aloc(f, (y * 10 + xo) * 4 + 1).int := aloc(inp, (y * 10 + xo) * 4 + 1).int + aloc(inp, (y * 10 + xo) * 4 + 10 * 4 + 1).int
      aloc(f, (y * 10 + xo) * 4 + 2).int := aloc(inp, (y * 10 + xo) * 4 + 2).int + aloc(inp, (y * 10 + xo) * 4 + 10 * 4 + 2).int
      aloc(f, (y * 10 + xo) * 4 + 3).int := aloc(inp, (y * 10 + xo) * 4 + 3).int + aloc(inp, (y * 10 + xo) * 4 + 10 * 4 + 3).int
      xo := xo + 1
    }
    y := y + 1
  }
}

Verifying this code takes 3.9 s on my machine. When I increase N, this blows up:

File Run 1 (s) Run 2 (s) Run 3 (s)
simple_4.vpr 3.99290132522583 3.9889650344848633 3.8979477882385254
simple_8.vpr 14.978343963623047 15.859347105026245 14.444764852523804
simple_10.vpr 38.03391981124878 32.59281373023987 33.86797595024109
simple_12.vpr 71.55659675598145 78.24415516853333 71.22353076934814
simple_14.vpr T.O. 171.31528210639954 173.9857680797577
simple_16.vpr 533.8845517635345 360.4852297306061 356.3802511692047

I'm running this with Z3_EXE=/usr/bin/z3 java -Xss8M -jar silicon/target/scala-*/silicon.jar simple_4.vpr And silicon version Silicon 1.1-SNAPSHOT (56ae917d+).

Question: Is there any way around this? Are there options to pass to Z3 which would deal better with this?

My suspicion is that verification time blows up, since the quantifier directly above the unrolled loop, must match N quantifier instantiations with N statements where f is written. This is kind of like finding the correct pairing for N items. So that is O($N^2$) time?

If there are any questions please let me know. Below is the jupyter notebook I used to generate the above table. (rename to UnrollViper.ipynb)

UnrollViper.txt

I have a few ideas.
Could you maybe post the N=10 and N=12 versions so that I can try them out and see if they make a difference?

sakehl commented

Thanks a lot for the quick reply. Below are the versions!

N=10
import <decreases/int.vpr>

domain array  {
  
  function array_loc(a: array, i: Int): Ref 
  
  function alen(a: array): Int 
  
  function loc_inv_1(loc: Ref): array 
  
  function loc_inv_2(loc: Ref): Int 
  
  axiom {
    (forall a: array, i: Int ::
      { array_loc(a, i) }
      loc_inv_1(array_loc(a, i)) == a && loc_inv_2(array_loc(a, i)) == i)
  }
  
  axiom {
    (forall a: array :: { alen(a) } alen(a) >= 0)
  }
}

field int: Int

function aloc(a: array, i: Int): Ref
  requires 0 <= i
  requires i < alen(a)
  decreases 
  ensures loc_inv_1(result) == a
  ensures loc_inv_2(result) == i
{
  array_loc(a, i)
}

method simple(tid: Int, inp: array, f: array)
  requires inp != f
  requires alen(inp) == 10 * 10 * 101
  requires alen(f) == 10 * 10 * 100
  requires (forall i_j: Int ::
      { aloc(inp, i_j) }
      0 <= i_j && i_j < 10 * 10 * 101 ==>
      acc(aloc(inp, i_j).int, wildcard))
  requires (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 10 * 100 ==> acc(aloc(f, i_j).int, 1 * write))
  ensures inp != f
  ensures alen(inp) == 10 * 10 * 101
  ensures alen(f) == 10 * 10 * 100
  ensures (forall i_j: Int ::
      { aloc(inp, i_j) }
      0 <= i_j && i_j < 10 * 10 * 101 ==>
      acc(aloc(inp, i_j).int, wildcard))
  ensures (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 10 * 100 ==> acc(aloc(f, i_j).int, 1 * write))
  ensures (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 10 * 100 ==>
      aloc(f, i_j).int ==
      aloc(inp, i_j).int + aloc(inp, i_j + 10 * 10).int)
  decreases
{
  var y: Int
  var xo: Int
  y := 0
  while (y < 100)
    invariant 0 <= y && y <= 100
    decreases 100 - y
    invariant (forall i_j: Int ::
        { aloc(inp, i_j) }
        0 <= i_j && i_j < 10 * 10 * 101 ==>
        acc(aloc(inp, i_j).int, wildcard))
    invariant (forall i_j: Int ::
        { aloc(f, i_j) }
        0 <= i_j && i_j < 10 * 10 * 100 ==>
        acc(aloc(f, i_j).int, 1 * write))
    invariant (forall xif_xf_yf: Int ::
        { aloc(f, xif_xf_yf) }
        0 <= xif_xf_yf && xif_xf_yf < 10 * 10 * y ==>
        aloc(f, xif_xf_yf).int ==
        aloc(inp, xif_xf_yf).int +
        aloc(inp, xif_xf_yf + 10 * 10).int) 
  {
    xo := 0
    while (xo < 10)
      invariant 0 <= xo && xo <= 10
      decreases 10 - xo
      invariant (forall i_j: Int ::
          { aloc(inp, i_j) }
          0 <= i_j && i_j < 10 * 10 * 101 ==>
          acc(aloc(inp, i_j).int, wildcard))
      invariant (forall xif_xf: Int ::
          { aloc(f, xif_xf) }
          0 <= xif_xf - y * 10 * 10 && xif_xf - y * 10 * 10 < 10 * 10 ==>
          acc(aloc(f, xif_xf).int, 1 * write))
      invariant (forall xif_xf: Int ::
          { aloc(f, xif_xf) }
          0 <= xif_xf - y * 10 * 10 && xif_xf - y * 10 * 10 < xo * 10 ==>
          aloc(f, xif_xf).int ==
          aloc(inp, xif_xf).int +
          aloc(inp, xif_xf + 10 * 10).int) 
    {
      aloc(f, (y * 10 + xo) * 10 + 0).int := aloc(inp, (y * 10 + xo) * 10 + 0).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 0).int
      aloc(f, (y * 10 + xo) * 10 + 1).int := aloc(inp, (y * 10 + xo) * 10 + 1).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 1).int
      aloc(f, (y * 10 + xo) * 10 + 2).int := aloc(inp, (y * 10 + xo) * 10 + 2).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 2).int
      aloc(f, (y * 10 + xo) * 10 + 3).int := aloc(inp, (y * 10 + xo) * 10 + 3).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 3).int
      aloc(f, (y * 10 + xo) * 10 + 4).int := aloc(inp, (y * 10 + xo) * 10 + 4).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 4).int
      aloc(f, (y * 10 + xo) * 10 + 5).int := aloc(inp, (y * 10 + xo) * 10 + 5).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 5).int
      aloc(f, (y * 10 + xo) * 10 + 6).int := aloc(inp, (y * 10 + xo) * 10 + 6).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 6).int
      aloc(f, (y * 10 + xo) * 10 + 7).int := aloc(inp, (y * 10 + xo) * 10 + 7).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 7).int
      aloc(f, (y * 10 + xo) * 10 + 8).int := aloc(inp, (y * 10 + xo) * 10 + 8).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 8).int
      aloc(f, (y * 10 + xo) * 10 + 9).int := aloc(inp, (y * 10 + xo) * 10 + 9).int + aloc(inp, (y * 10 + xo) * 10 + 10 * 10 + 9).int
      xo := xo + 1
    }
    y := y + 1
  }
}
N=12
import <decreases/int.vpr>

domain array  {
  
  function array_loc(a: array, i: Int): Ref 
  
  function alen(a: array): Int 
  
  function loc_inv_1(loc: Ref): array 
  
  function loc_inv_2(loc: Ref): Int 
  
  axiom {
    (forall a: array, i: Int ::
      { array_loc(a, i) }
      loc_inv_1(array_loc(a, i)) == a && loc_inv_2(array_loc(a, i)) == i)
  }
  
  axiom {
    (forall a: array :: { alen(a) } alen(a) >= 0)
  }
}

field int: Int

function aloc(a: array, i: Int): Ref
  requires 0 <= i
  requires i < alen(a)
  decreases 
  ensures loc_inv_1(result) == a
  ensures loc_inv_2(result) == i
{
  array_loc(a, i)
}

method simple(tid: Int, inp: array, f: array)
  requires inp != f
  requires alen(inp) == 10 * 12 * 101
  requires alen(f) == 10 * 12 * 100
  requires (forall i_j: Int ::
      { aloc(inp, i_j) }
      0 <= i_j && i_j < 10 * 12 * 101 ==>
      acc(aloc(inp, i_j).int, wildcard))
  requires (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 12 * 100 ==> acc(aloc(f, i_j).int, 1 * write))
  ensures inp != f
  ensures alen(inp) == 10 * 12 * 101
  ensures alen(f) == 10 * 12 * 100
  ensures (forall i_j: Int ::
      { aloc(inp, i_j) }
      0 <= i_j && i_j < 10 * 12 * 101 ==>
      acc(aloc(inp, i_j).int, wildcard))
  ensures (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 10 * 12 * 100 ==> acc(aloc(f, i_j).int, 1 * write))
  ensures (forall i_j: Int ::
      { aloc(f, i_j) }
      0 <= i_j && i_j < 12 * 10 * 100 ==>
      aloc(f, i_j).int ==
      aloc(inp, i_j).int + aloc(inp, i_j + 10 * 12).int)
  decreases
{
  var y: Int
  var xo: Int
  y := 0
  while (y < 100)
    invariant 0 <= y && y <= 100
    decreases 100 - y
    invariant (forall i_j: Int ::
        { aloc(inp, i_j) }
        0 <= i_j && i_j < 10 * 12 * 101 ==>
        acc(aloc(inp, i_j).int, wildcard))
    invariant (forall i_j: Int ::
        { aloc(f, i_j) }
        0 <= i_j && i_j < 10 * 12 * 100 ==>
        acc(aloc(f, i_j).int, 1 * write))
    invariant (forall xif_xf_yf: Int ::
        { aloc(f, xif_xf_yf) }
        0 <= xif_xf_yf && xif_xf_yf < 12 * 10 * y ==>
        aloc(f, xif_xf_yf).int ==
        aloc(inp, xif_xf_yf).int +
        aloc(inp, xif_xf_yf + 10 * 12).int) 
  {
    xo := 0
    while (xo < 10)
      invariant 0 <= xo && xo <= 10
      decreases 10 - xo
      invariant (forall i_j: Int ::
          { aloc(inp, i_j) }
          0 <= i_j && i_j < 10 * 12 * 101 ==>
          acc(aloc(inp, i_j).int, wildcard))
      invariant (forall xif_xf: Int ::
          { aloc(f, xif_xf) }
          0 <= xif_xf - y * 10 * 12 && xif_xf - y * 10 * 12 < 10 * 12 ==>
          acc(aloc(f, xif_xf).int, 1 * write))
      invariant (forall xif_xf: Int ::
          { aloc(f, xif_xf) }
          0 <= xif_xf - y * 10 * 12 && xif_xf - y * 10 * 12 < xo * 12 ==>
          aloc(f, xif_xf).int ==
          aloc(inp, xif_xf).int +
          aloc(inp, xif_xf + 10 * 12).int) 
    {
      aloc(f, (y * 10 + xo) * 12 + 0).int := aloc(inp, (y * 10 + xo) * 12 + 0).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 0).int
      aloc(f, (y * 10 + xo) * 12 + 1).int := aloc(inp, (y * 10 + xo) * 12 + 1).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 1).int
      aloc(f, (y * 10 + xo) * 12 + 2).int := aloc(inp, (y * 10 + xo) * 12 + 2).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 2).int
      aloc(f, (y * 10 + xo) * 12 + 3).int := aloc(inp, (y * 10 + xo) * 12 + 3).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 3).int
      aloc(f, (y * 10 + xo) * 12 + 4).int := aloc(inp, (y * 10 + xo) * 12 + 4).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 4).int
      aloc(f, (y * 10 + xo) * 12 + 5).int := aloc(inp, (y * 10 + xo) * 12 + 5).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 5).int
      aloc(f, (y * 10 + xo) * 12 + 6).int := aloc(inp, (y * 10 + xo) * 12 + 6).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 6).int
      aloc(f, (y * 10 + xo) * 12 + 7).int := aloc(inp, (y * 10 + xo) * 12 + 7).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 7).int
      aloc(f, (y * 10 + xo) * 12 + 8).int := aloc(inp, (y * 10 + xo) * 12 + 8).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 8).int
      aloc(f, (y * 10 + xo) * 12 + 9).int := aloc(inp, (y * 10 + xo) * 12 + 9).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 9).int
      aloc(f, (y * 10 + xo) * 12 + 10).int := aloc(inp, (y * 10 + xo) * 12 + 10).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 10).int
      aloc(f, (y * 10 + xo) * 12 + 11).int := aloc(inp, (y * 10 + xo) * 12 + 11).int + aloc(inp, (y * 10 + xo) * 12 + 10 * 12 + 11).int
      xo := xo + 1
    }
    y := y + 1
  }
}

Thanks.
For the unrolled assignments, Silicon seems to be looking for the relevant heap chunks in the worst possible order, but it's not clear to me how that could be avoided in general, nor why it's that slow.
As an immediate workaround I'd recommend to just use Carbon as a backend instead; it seems to behave much better. But I'll try to figure out if there's a way to improve Silicon's performance here.

PR #728 should essentially fix the order issue, but Silicon is still a lot slower than Carbon on this example, and I'll have to check if the different order slows Silicon down on other examples.

PR is merged, so it should scale at least somewhat better now (but still worse than Carbon).

sakehl commented

Thanks a lot for the quick replies and work! I will test this. For the more extended versions of my experiments, I unfortunately could not get Carbon to finish in a reasonable time.