StanfordAHA/garnet

PD: Scan regs appear out of thin air, prevent routing

Closed this issue · 5 comments

Friday-night build 352 failed, with LVS errors in both PE and mem-tile LVS. As far as I can tell, the only difference between build 352 and the previously-successful build 347 was the sudden appearance of scan registers (flipflops) in place of normal D flipflops during synthesis. It appears to be using the scan registers as a substitute for a separate mux-register combination, see sketch below.

sketch

Why is this bad? The normal register / flipflop is small, has only three ports D, Q, and clock, and all pins are in metal-1 (M1), so not much exposed M2 in the register itself (DFQD below).

The scan register is not much bigger, but it has two extra ports SI and SE, and all pins are in metal-2. As you can see, the D port would be nearly impossible to reach by a signal attempting to do the last-mile route in M2 (SDFQD below).

regs

Worse, after placement, the registers are jailed behind thick vertical M3 power stripes, so the ability to approach any of the M2 pins via M3 is also severely limited. In the image below, the M3 tracks near the D pin are already taken up by M3 wires (not shown) going to e.g. the SE and CK pins, and there is no room left for an M3 path to the D pin, so the router's only choice to reach D is with M2. As you can see, it chose to run the M2 wire over the top of an existing M2 wire in the register (under those red bricks), creating a short that, in our current flow, is not detected until LVS.

final-short2

Because the short is not detected immediately, several hours are wasted getting to the final LVS step that flags the error. So, first on my list of things to do is to check for short circuits after each routing step and halt the mflowgen process early when/if one is detected.

Next on the to-do list is to actually fix/forestall the short circuits that arise from these new registers. Options include

  1. Somehow force the synthesizer to never use the bad flipflops. I'm not sure how to do this, maybe someone else has a good idea on how to do it?

  2. Delete the power stripe nearest to each short-circuit and reroute. Now that there is more room, the route may succeed.

  3. Change the spacing of the vertical power stripes so as to allow more room for M3 signal tracks in between.

  4. Other better options that you can think of?

The best option is probably 3, in my opinion. It turns out that just a tiny bit of extra spacing between the power stripes can increase the number of interstitial M3 tracks from 2 to 3, potentially yielding benefits throughout the chip. I will be filing a seperate issue on this topic.

Short term, I have actually implemented option 2, and it seems to work as a quick and dirty way to get CI running again (we have not had a successful Friday Night Build in over a month!). If all goes well, I will be filing a pull very shortly to get that back on track.

SDFFs appeared because a hack in the default mflowgen genus node was removed (mflowgen/mflowgen@4109aa8). So, to go with option 1, which is what we were doing before, add the following to the adk.tcl:
set ADK_DONT_USE_CELL_LIST "*/E* */G* */*D16* */*D20* */*D24* */*D28* */*D32* */SDF* */*DFM* */*SEDF*"

This obviously covers more than SDFFs, but this is what we were doing before. Also, we should probably stop maintaining this flow anyway as we switch technologies.

It was this exact case haha, but it should've never been in the default mflowgen node. That node was created quickly from old genus scripts when we had to switch from DC to genus because of VDE.

Update: I implemented Alex's fix in the tsmc16 adk.tcl files; now re-running the failed build to see if it's going to work, should know within the next 24 hours https://buildkite.com/tapeout-aha/fullchip/builds/356

Everything seems to be working, I will close out this issue.