calyxir/calyx

Wrong outputs for new NTT function.

Closed this issue · 4 comments

Similar to before, this is producing bogus outputs when lowered to FuTIL and simulated. The overall structure isn't much different from the other Cooley-Tukey algorithm.

What I've tried:

  • Lowering to C++. The results are correct, which means the algorithm is written correctly.
  • Using larger bit widths.
  • (Ab)using ---.
  • Saving multi-cycle outputs to temporaries, e.g. tmp = a % b.

I think next step is walking through cycles with Scansion. Open to other suggestions as well.

Algorithm

decl a: bit<32>[8];

decl Q: bit<32>[1];

decl phis: bit<32>[8];

decl a_ntt: bit<32>[8];
---

let n: ubit<5> = 8;
---
let q: bit<32> = Q[0];

---

let idx: ubit<5> = 0;
---
while (idx < n) {
  a_ntt[idx] := a[idx];
  ---
  idx := idx + 1;
}
---
let t: ubit<5> = n;
let m: ubit<5> = 1;
---
while (m < n) {
  t := t >> 1;
  ---
  let i: ubit<5> = 0;
  ---
  while (i < m) {
    let t_shifted: ubit<5> = (t << 1);
    ---
    let j1: ubit<5> = i * t_shifted;
    ---
    let j2: ubit<5> = j1 + t - 1;
    ---
    let S: bit<32> = phis[m + i];
    ---
    let j: ubit<5> = j1;
    ---
    while (j < j2 + 1) {
      let U: bit<32> = a_ntt[j];
      ---
      let tmp: bit<32> = a_ntt[j + t];
      ---
      let V: bit<32> = tmp * S;
      ---
      let x: bit<32> = U + V;
      ---
      a_ntt[j] := x % q;
      ---
      let y: bit<32> = U - V;
      ---
      a_ntt[j + t] := y % q;
      ---
      j := j + 1;
      ---
    }
    i := i + 1;
    ---
  }
  m := m << 1;
  ---
}

Data

{
  "a0": {
    "bitwidth": 32,
    "data":  [213, 823, 479, 176, 488, 106, 964, 287]
  },
  "phis0": {
    "bitwidth": 32,
    "data":  [1, 1813, 6761, 300, 1160, 2626, 5883, 3269]
  },
  "Q0": {
    "bitwidth": 32,
    "data": [8017]
  },
  "a_ntt0": {
    "bitwidth": 32,
    "data": [0, 0, 0, 0, 0, 0, 0, 0]
  }
}

Expected

{
  "Q0": [
    8017
  ],
  "a0": [
    213,
    823,
    479,
    176,
    488,
    106,
    964,
    287
  ],
  "a_ntt0": [
    7156,
    5161,
    7230,
    818,
    6296,
    3638,
    7296,
    4194 
  ],
  "phis0": [
    1,
    1813,
    6761,
    300,
    1160,
    2626,
    5883,
    3269
  ]
}

Super fun! One good way to debug such things is to try to narrow down the difference so it's as small as possible. That is, remove lots of stuff from the code, causing it to produce a "wrong" output—but then still check whether the Verilog simulation and the C++ execution differ. What's the smallest program you can produce (which may bear only a remote resemblance to NTT) where the two outputs are different? Once you have that, the underlying problem often becomes obvious.

Super fun! One good way to debug such things is to try to narrow down the difference so it's as small as possible. That is, remove lots of stuff from the code, causing it to produce a "wrong" output—but then still check whether the Verilog simulation and the C++ execution differ. What's the smallest program you can produce (which may bear only a remote resemblance to NTT) where the two outputs are different? Once you have that, the underlying problem often becomes obvious.

This helped, thanks. The issue:

   ...
   }
    // No `---` here.
    i := i + 1;
    ---
  }
  // No `---` here.
  m := m << 1;
  ---
}

--- shouldn't randomly change the behavior of a program. Can you add an issue in the Dahlia repo about this? Specifically which file is expected to work and what change made it work.

--- shouldn't randomly change the behavior of a program. Can you add an issue in the Dahlia repo about this? Specifically which file is expected to work and what change made it work.

Sure, let me try and produce a smaller example. I was just playing with it, and it only happens if I don't include --- after m := m << 1;. The behavior remains unchanged whether --- is after i := i + 1 or not. This, at the moment, doesn't really say much to me.