Switch to CW floats
Opened this issue · 17 comments
Due to the PD environment change, we no longer support DW float IP. As a result, all the tests need to migrate to CW IPs.
@jack-melchert
Running regressions with CW IP. Found couple errors with current lassen. There are several failed tests. My guess is they are all related to the same bug.
p = op(inst=Inst(alu=ALU_t.FP_sub, signed=Signed_t.unsigned, lut=0, cond=Cond_t.Z, rega=Mode_t.BYPASS, data0=0, regb=Mode_... rege=Mode_t.BYPASS, bit1=Bit(False), regf=Mode_t.BYPASS, bit2=Bit(False)), func=<function <lambda> at 0x7f7af442fcb0>)
args = (<class 'hwtypes.fp_vector.FPVector[8,7,RoundingMode.RNE,False]'>(-0.0), <class 'hwtypes.fp_vector.FPVector[8,7,RoundingMode.RNE,False]'>(0.0))
@pytest.mark.parametrize("op", [
op(asm.fp_add(), lambda x, y: x + y),
op(asm.fp_sub(), lambda x, y: x - y),
op(asm.fp_mul(), lambda x, y: x * y)
])
@pytest.mark.parametrize("args",
[(BFloat16.random(), BFloat16.random()) for _ in range(NTESTS)] +
list(product(fp_sign_vec+fp_zero_vec, fp_sign_vec+fp_zero_vec))
)
def test_fp_binary_op(op, args):
inst = op.inst
in0 = args[0]
in1 = args[1]
out = op.func(in0, in1)
data0 = BFloat16.reinterpret_as_bv(in0)
data1 = BFloat16.reinterpret_as_bv(in1)
res, res_p, _ = pe(inst, data0, data1)
assert res == BFloat16.reinterpret_as_bv(out)
if CAD_ENV:
> rtl_tester(op, data0, data1, res=res)
tests/test_pe.py:209:
Here is an easy way to reproduce:
def test_fp_mul():
# Regression test for https://github.com/StanfordAHA/lassen/issues/111
inst = asm.fp_add()
data0 = Data(0x8000)
data1 = Data(0x8000)
res, res_p, _ = pe(inst, data0, data1)
if CAD_ENV:
rtl_tester(inst, data0, data1, res=res)
else:
pytest.skip("Skipping since DW not available")
Thanks for letting me know, I'll take a look.
Here is the diff if you want to switch to CW IP:
diff --git a/tests/rtl_utils.py b/tests/rtl_utils.py
index bdf3a24..9629148 100644
--- a/tests/rtl_utils.py
+++ b/tests/rtl_utils.py
@@ -50,10 +50,10 @@ test_dir = "tests/build"
# the coreir context causing a "redefinition of module" error
magma.backend.coreir_.CoreIRContextSingleton().reset_instance()
magma.compile(f"{test_dir}/WrappedPE", pe_circuit, output="coreir-verilog",
- coreir_libs={"float_DW"})
+ coreir_libs={"float_CW"})
# check if we need to use ncsim + cw IP
-cw_dir = "/cad/synopsys/dc_shell/J-2014.09-SP3/dw/sim_ver/" # noqa
+cw_dir = "/cad/cadence/GENUS_19.10.000_lnx86/share/synth/lib/chipware/sim/verilog/CW/" # noqa
CAD_ENV = shutil.which("ncsim") and os.path.isdir(cw_dir)
@@ -109,13 +109,14 @@ def rtl_tester(test_op, data0=None, data1=None, bit0=None, bit1=None, bit2=None,
tester.circuit.O1.expect(res_p)
if CAD_ENV:
# use ncsim
- libs = ["DW_fp_mult.v", "DW_fp_add.v", "DW_fp_addsub.v"]
+ libs = ["CW_fp_mult.v", "CW_fp_add.v"]
for filename in libs:
copy_file(os.path.join(cw_dir, filename),
os.path.join(test_dir, filename))
tester.compile_and_run(target="system-verilog", simulator="ncsim",
directory="tests/build/",
include_verilog_libraries=libs,
+ flags=["-sv"],
skip_compile=True)
else:
libs = ["DW_fp_mult.v", "DW_fp_add.v"]
CoreIR has the ieee_complicance bit set to false when instantiating the CW and DW fp units. Changing this to 1 fixes this lassen issue.
I'm not sure why it was set to 0 in the first place, but hopefully that will get resolved soon.
Email message from Alex back in June 2019 about this issue:
"Ran synthesis with ieee_compliance on… it increases pe tile area by about 10% , and it also affects our critical path for the PE. It did not meet timing at 454 MHz (2.2ns period), whereas the core without ieee_compliance did.
Alex"
I suspect the reason we wanted to disable denormal numbers was due to the increase in area and longer critical paths.
The PD team should decide if this area/critical path increase is possible before we switch it. Otherwise we should disable any tests that contain denormalized numbers.
I don't think the test case here is related to denorm issue. 0x8000
is negative 0, a behavior specified by IEEE standard, as seen here: https://en.wikipedia.org/wiki/Signed_zero
The test case basically means if we have -0 + -0
, we will get x
.
Based on CW fp_add
documentation, when ieee_compliance=0
, this is expected behavior:
A sA.expA.sigA B sB.expB.sigB Infinitely Precision Result (F)
± Zero ± Zero ± Zero
The documentation also states that -0
is not a denormal number.
Sign (s) Exponent (exp) Significand (sig) Value Short Name
s 0 0 (-1)s0 ±Zero
s 0 0 < sig (-1)s 21-bias0.sig Denormal
I see. Are the X's originating from the CW_fp_add module itself? Or are we getting Xs as a result of some logic in lassen
I did cross-examination on the documentation and waveforms, and I'm leaning towards stating that this is a bug in the CW_fp_add
simulation IP.
For those of you who have access to Cadence support, this is the link to the documentation.
Here is the signals coming in/out from the IP:
The status
flag is set properly: 01
means zero. However, the z
is set to x
.
I do not have access to the documentation, but I wonder if this is specified in the documentation. It might say that if the status indicates 0, then the z output could be anything (although that is a very stupid design decision)
I don't think it's specified anywhere in the documentation. 0x0000 + 0x000
gives correct z
and status
.
Here is my take on how to fix this:
- turn on
ieee_compliance
. this is ideal if timing/area allowed - extra logic to gate output to zero if
status
indicates zero.
I suspect 2) would be a cheaper (area/timing) solution. I can stamp out a mux (with +0) in the CoreIR implementation. Tests still might break if they are looking for a -0 though.
I asked Kathleen to synthesize PE with ieee_compliance
on. I will report back the result once it's finished.
Did more testing and it seems like 0 + 0
returns x
as well. -0 + 0
, however, returns the correct result.
Targeting at 1.1ns
clock speed, here is the timing/area comparison. Notice that both CW_fp_add
and CW_fp_mult
are modified.
TL;DR; timing is similar; area is less than 2% increase.
ieee_compliance=1
:
- timing:
8 Slack (ps): -4
9 R2R (ps): 0
10 I2R (ps): -4
11 R2O (ps): 66
12 I2O (ps): -0
13 CG (ps): 1
- area: 7274.558
ieee_compliance=0
:
- timing:
8 Slack (ps): -6
9 R2R (ps): 0
10 I2R (ps): -6
11 R2O (ps): 72
12 I2O (ps): 0
13 CG (ps): 0
- area: 7151.742
@rdaly525 I got green light to turn on ieee_compliance
. Would you please make necessary changes sometime today or tomorrow so that we can test more? Thanks.
#179 and rdaly525/coreir#973 have the changes