Some DFFRAM configurations have hold violations
antonblanchard opened this issue · 12 comments
After updating to the latest openlane (which changes STA from using an ideal clock to an actual clock), DFFRAM is seeing hold violations. A simple test:
./dffram.py --size 32x32
shows:
[WARNING]: There are hold violations in the design at the typical corner. Please refer to /mnt/dffram/build/32x32_DEFAULT/openlane/runs/RUN_2022.03.02_22.00.30/reports/routing/13-parasitics_sta.min.rpt.
The two DFFRAMs in Microwatt I'm using are (that both have hold violations):
./dffram.py --size 32x64 --variant 1RW1R --min-height 180
./dffram.py --size 512x64 --vertical-halo 100 --horizontal-halo 20
@donn share with me the timing report(s) to investigate the cause of thes hold vios.
@shalan I'll have to harvest them- @antonblanchard do you have them on hand?
Here's an example when building a 512x64
DFFRAM. The clock makes it to the output buffering stage a long time before the memory elements.
======================= Typical Corner ===================================
Startpoint: Di0[1] (input port clocked by CLK)
Endpoint: BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.BIT[1].genblk1.STORAGE
(positive level-sensitive latch clocked by CLK')
Path Group: CLK
Path Type: min
Corner: tt
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock CLK (rise edge)
0.00 0.00 clock network delay (propagated)
2.24 2.24 v input external delay
0.47 0.31 2.56 v Di0[1] (in)
8 0.21 Di0[1] (net)
0.47 0.00 2.56 v BANK128[0].RAM128.DIBUF[1]/A (sky130_fd_sc_hd__clkbuf_16)
0.08 0.34 2.89 v BANK128[0].RAM128.DIBUF[1]/X (sky130_fd_sc_hd__clkbuf_16)
4 0.08 BANK128[0].RAM128.BLOCK[0].RAM32.Di0[1] (net)
0.08 0.02 2.91 v BANK128[0].RAM128.BLOCK[3].RAM32.DIBUF[1]/A (sky130_fd_sc_hd__clkbuf_16)
0.06 0.18 3.09 v BANK128[0].RAM128.BLOCK[3].RAM32.DIBUF[1]/X (sky130_fd_sc_hd__clkbuf_16)
32 0.07 BANK128[0].RAM128.BLOCK[3].RAM32.Di0_buf[1] (net)
0.06 0.00 3.09 v BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.BIT[1].genblk1.STORAGE/D (sky130_fd_sc_hd__dlxtp_1)
3.09 data arrival time
0.00 0.00 clock CLK' (fall edge)
0.00 0.00 clock source latency
3.04 2.37 2.37 ^ CLK (in)
20 0.69 CLK (net)
3.40 0.00 2.37 ^ BANK128[0].RAM128.CLKBUF[3]/A (sky130_fd_sc_hd__clkbuf_4)
1.44 1.33 3.70 ^ BANK128[0].RAM128.CLKBUF[3]/X (sky130_fd_sc_hd__clkbuf_4)
10 0.51 BANK128[0].RAM128.BLOCK[3].RAM32.CLK (net)
1.47 0.19 3.90 ^ BANK128[0].RAM128.BLOCK[3].RAM32.CLKBUF/A (sky130_fd_sc_hd__clkbuf_2)
0.14 0.37 4.26 ^ BANK128[0].RAM128.BLOCK[3].RAM32.CLKBUF/X (sky130_fd_sc_hd__clkbuf_2)
4 0.02 BANK128[0].RAM128.BLOCK[3].RAM32.CLK_buf (net)
0.14 0.00 4.27 ^ BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.CLKBUF/A (sky130_fd_sc_hd__clkbuf_2)
0.13 0.20 4.47 ^ BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.CLKBUF/X (sky130_fd_sc_hd__clkbuf_2)
8 0.02 BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.CLK_buf (net)
0.13 0.00 4.47 ^ BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
1.32 0.92 5.39 ^ BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
16 0.45 BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.CLK (net)
1.43 0.31 5.70 ^ BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.genblk1.CLKINV/A (sky130_fd_sc_hd__inv_1)
0.20 0.15 5.85 v BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.genblk1.CLKINV/Y (sky130_fd_sc_hd__inv_1)
1 0.00 BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.CLK_B (net)
0.20 0.00 5.85 v BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.genblk1.CG/CLK (sky130_fd_sc_hd__dlclkp_1)
0.14 0.30 6.15 v BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.genblk1.CG/GCLK (sky130_fd_sc_hd__dlclkp_1)
8 0.03 BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.GCLK (net)
0.14 0.00 6.15 v BANK128[0].RAM128.BLOCK[3].RAM32.SLICE[2].RAM8.WORD[2].W.BYTE[0].B.BIT[1].genblk1.STORAGE/GATE (sky130_fd_sc_hd__dlxtp_1)
0.25 6.40 clock uncertainty
0.00 6.40 clock reconvergence pessimism
0.00 6.40 library hold time
6.40 data required time
-----------------------------------------------------------------------------
6.40 data required time
-3.09 data arrival time
-----------------------------------------------------------------------------
-3.31 slack (VIOLATED)
The hold vio is due to a bad constraint for this input-to-reg timing path. To get this fixed we need to adjust the driving cell constraint for input ports; it should be realistic; e.g., clkbuf_4 instead of inv_1. This would reduce the clock latency and slew at the input.
Also, I noticed a few minor issues in the clock tree; fixing them would make it more robust.
I discussed the fixes with @donn and they should be out soon.
Should be fixed now
Okay, so everything but 8x* and the register file should be good now. @antonblanchard Mind testing?
Thank you @donn, the cache RAMs (32x64_1RW1R) have no hold violations. My main RAM (512x64) still have hold violations unfortunately:
./dffram.py --size 512x64 --vertical-halo 100 --horizontal-halo 20
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock CLK (rise edge)
0.00 0.00 clock network delay (propagated)
3.75 3.75 v input external delay
0.14 0.10 3.85 v Di0[1] (in)
8 0.22 Di0[1] (net)
0.16 0.00 3.85 v BANK128[1].RAM128.DIBUF[1]/A (sky130_fd_sc_hd__clkbuf_16)
0.07 0.21 4.06 v BANK128[1].RAM128.DIBUF[1]/X (sky130_fd_sc_hd__clkbuf_16)
4 0.07 BANK128[1].RAM128.BLOCK[0].RAM32.Di0[1] (net)
0.07 0.00 4.06 v BANK128[1].RAM128.BLOCK[0].RAM32.DIBUF[1]/A (sky130_fd_sc_hd__clkbuf_16)
0.06 0.17 4.23 v BANK128[1].RAM128.BLOCK[0].RAM32.DIBUF[1]/X (sky130_fd_sc_hd__clkbuf_16)
32 0.07 BANK128[1].RAM128.BLOCK[0].RAM32.Di0_buf[1] (net)
0.06 0.00 4.23 v BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.BIT[1].genblk1.STORAGE/D (sky130_fd_sc_hd__dlxtp_1)
4.23 data arrival time
0.00 0.00 clock CLK' (fall edge)
0.00 0.00 clock source latency
0.67 1.09 1.09 ^ CLK (in)
8 0.64 CLK (net)
1.87 0.00 1.09 ^ BANK128[1].RAM128.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
0.25 0.60 1.69 ^ BANK128[1].RAM128.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
8 0.08 BANK128[1].RAM128.BLOCK[0].RAM32.CLK (net)
0.25 0.00 1.69 ^ BANK128[1].RAM128.BLOCK[0].RAM32.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
1.10 0.84 2.53 ^ BANK128[1].RAM128.BLOCK[0].RAM32.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
5 0.37 BANK128[1].RAM128.BLOCK[0].RAM32.CLK_buf (net)
1.10 0.01 2.54 ^ BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.CLKBUF/A (sky130_fd_sc_hd__clkbuf_2)
0.14 0.35 2.88 ^ BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.CLKBUF/X (sky130_fd_sc_hd__clkbuf_2)
8 0.02 BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.CLK_buf (net)
0.14 0.00 2.88 ^ BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
1.42 0.94 3.83 ^ BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
16 0.49 BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.CLK (net)
1.60 0.41 4.23 ^ BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.genblk1.CLKINV/A (sky130_fd_sc_hd__inv_1)
0.21 0.16 4.39 v BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.genblk1.CLKINV/Y (sky130_fd_sc_hd__inv_1)
1 0.01 BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.CLK_B (net)
0.21 0.00 4.39 v BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.genblk1.CG/CLK (sky130_fd_sc_hd__dlclkp_1)
0.16 0.32 4.71 v BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.genblk1.CG/GCLK (sky130_fd_sc_hd__dlclkp_1)
8 0.03 BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.GCLK (net)
0.16 0.00 4.71 v BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.BIT[1].genblk1.STORAGE/GATE (sky130_fd_sc_hd__dlxtp_1)
0.25 4.96 clock uncertainty
0.00 4.96 clock reconvergence pessimism
0.01 4.97 library hold time
4.97 data required time
-----------------------------------------------------------------------------
4.97 data required time
-4.23 data arrival time
-----------------------------------------------------------------------------
-0.74 slack (VIOLATED)
Running STA across the entire design (including the 512x64 DFFRAM):
Startpoint: _131570_ (rising edge-triggered flip-flop clocked by user_clock2)
Endpoint: microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.BIT[5].genblk1.STORAGE
(positive level-sensitive latch clocked by user_clock2')
Path Group: user_clock2
Path Type: min
Corner: tt
Fanout Cap Slew Delay Time Description
-----------------------------------------------------------------------------
0.00 0.00 clock user_clock2 (rise edge)
0.00 0.00 clock source latency
0.43 0.31 0.31 ^ user_clock2 (in)
1 0.09 user_clock2 (net)
0.44 0.00 0.31 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
0.42 0.37 0.68 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
1 0.38 net630 (net)
0.47 0.11 0.78 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
1.15 0.69 1.47 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
16 1.20 clknet_0_user_clock2 (net)
1.42 0.41 1.88 ^ clkbuf_4_12_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.44 0.53 2.42 ^ clkbuf_4_12_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.08 clknet_4_12_0_user_clock2 (net)
0.44 0.01 2.42 ^ clkbuf_5_25__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.26 0.36 2.79 ^ clkbuf_5_25__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
10 0.25 clknet_5_25__leaf_user_clock2 (net)
0.26 0.03 2.81 ^ clkbuf_leaf_164_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.10 0.23 3.05 ^ clkbuf_leaf_164_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
17 0.08 clknet_leaf_164_user_clock2 (net)
0.10 0.00 3.05 ^ _131570_/CLK (sky130_fd_sc_hd__dfxtp_1)
0.03 0.30 3.35 v _131570_/Q (sky130_fd_sc_hd__dfxtp_1)
8 0.00 microwatt_0.soc0.bram.bram0.ram_0._4_[37] (net)
0.03 0.00 3.35 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.DIBUF[37]/A (sky130_fd_sc_hd__clkbuf_16)
0.07 0.16 3.52 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.DIBUF[37]/X (sky130_fd_sc_hd__clkbuf_16)
4 0.08 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.Di0[37] (net)
0.07 0.00 3.52 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.DIBUF[37]/A (sky130_fd_sc_hd__clkbuf_16)
0.08 0.18 3.70 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.DIBUF[37]/X (sky130_fd_sc_hd__clkbuf_16)
32 0.09 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.Di0_buf[37] (net)
0.09 0.01 3.72 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.BIT[5].genblk1.STORAGE/D (sky130_fd_sc_hd__dlxtp_1)
3.72 data arrival time
0.00 0.00 clock user_clock2' (fall edge)
0.00 0.00 clock source latency
0.43 0.34 0.34 ^ user_clock2 (in)
1 0.09 user_clock2 (net)
0.44 0.00 0.34 ^ repeater12/A (sky130_fd_sc_hd__buf_12)
0.42 0.41 0.75 ^ repeater12/X (sky130_fd_sc_hd__buf_12)
1 0.38 net630 (net)
0.47 0.12 0.86 ^ clkbuf_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
1.15 0.76 1.63 ^ clkbuf_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
16 1.20 clknet_0_user_clock2 (net)
1.40 0.43 2.06 ^ clkbuf_4_10_0_user_clock2/A (sky130_fd_sc_hd__clkbuf_2)
0.61 0.69 2.75 ^ clkbuf_4_10_0_user_clock2/X (sky130_fd_sc_hd__clkbuf_2)
2 0.11 clknet_4_10_0_user_clock2 (net)
0.61 0.03 2.78 ^ clkbuf_5_21__f_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.31 0.45 3.24 ^ clkbuf_5_21__f_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
9 0.30 clknet_5_21__leaf_user_clock2 (net)
0.34 0.07 3.31 ^ clkbuf_leaf_98_user_clock2/A (sky130_fd_sc_hd__clkbuf_16)
0.07 0.25 3.56 ^ clkbuf_leaf_98_user_clock2/X (sky130_fd_sc_hd__clkbuf_16)
13 0.05 clknet_leaf_98_user_clock2 (net)
0.07 0.00 3.56 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
0.23 0.29 3.84 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
8 0.08 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.CLK (net)
0.23 0.00 3.85 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
1.10 0.83 4.68 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
5 0.37 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.CLK_buf (net)
1.10 0.01 4.69 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.CLKBUF/A (sky130_fd_sc_hd__clkbuf_2)
0.14 0.35 5.03 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.CLKBUF/X (sky130_fd_sc_hd__clkbuf_2)
8 0.02 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.CLK_buf (net)
0.14 0.00 5.03 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.CLKBUF/A (sky130_fd_sc_hd__clkbuf_4)
1.42 0.94 5.98 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.CLKBUF/X (sky130_fd_sc_hd__clkbuf_4)
16 0.49 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[0].B.CLK (net)
1.54 0.34 6.31 ^ microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.genblk1.CLKINV/A (sky130_fd_sc_hd__inv_1)
0.21 0.16 6.47 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.genblk1.CLKINV/Y (sky130_fd_sc_hd__inv_1)
1 0.01 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.CLK_B (net)
0.21 0.00 6.47 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.genblk1.CG/CLK (sky130_fd_sc_hd__dlclkp_1)
0.17 0.33 6.80 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.genblk1.CG/GCLK (sky130_fd_sc_hd__dlclkp_1)
8 0.03 microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.GCLK (net)
0.17 0.00 6.80 v microwatt_0.soc0.bram.bram0.ram_0.memory_0/BANK128[1].RAM128.BLOCK[0].RAM32.SLICE[0].RAM8.WORD[7].W.BYTE[4].B.BIT[5].genblk1.STORAGE/GATE (sky130_fd_sc_hd__dlxtp_1)
0.25 7.05 clock uncertainty
-0.15 6.89 clock reconvergence pessimism
0.00 6.90 library hold time
6.90 data required time
-----------------------------------------------------------------------------
6.90 data required time
-3.72 data arrival time
-----------------------------------------------------------------------------
-3.18 slack (VIOLATED)