PD: Nondeterminism in PE tile
steveri opened this issue · 0 comments
While debugging changes to the glb tile, I found intermittent/nondeterministic failures in the PE tile(!).
Example: Eight different builds used essentially the same RTL except that sometimes the GLB tile was using 64K SRAM and sometimes 256K. Meanwhile, the somewhat-unrelated PE tile would sometimes pass and sometimes fail regardless of the GLB SRAM size setting. See far below for a summary of the eight builds.
One of the failure errors seemed to be related to uniquification problems, so, following the advice of the innovus error message, I set an init_design_uniquify
variable to fix that. FYI, the specific error message from Innovus was
**WARN: (IMPECO-560): The netlist is not unique, because the module
'Tile_PE_mux_logic_1_20' is instantiated multiple times. Make the
netlist unique by running 'set init_design_uniquify 1' before
loading the design to avoid the problem.
Type 'man IMPECO-560' for more detail.
Another failure that occurred more than once was a short in M6 after postroute. I thought this might be solved by fixing the uniquification problem, but that turned out not to be the case. By trial and error, I found that the short could be fixed by adjusting my already-existing PE fix-shorts
script to do ten eco-route iterations instead of just two. The difference in time was negligible, just a minute or two difference to complete ten iterations instead of two.
+ # setNanoRouteMode -drouteEndIteration 2
+ setNanoRouteMode -drouteEndIteration 10
It appears that sometimes we need one or both of these fixes and sometimes not, depending on randomness in the environment. But I'm hoping that leaving both fixes intact will improve robustness going forward.
Here is a summary of the eight runs that produced intermittent PE tile failures:
GARNET RUN SRAM
HASH NAME SIZE RESULT
--------------------------------------------------------------------------
212fc7c glb4129 256K PASSED glb_top only
212fc7c gold.280 256K FAILED full_chip context: uniq error + metal short
--------------------------------------------------------------------------
aa69f42 gold.4140 256K PASSED was supposed to be same as gold.280
75d1da4 gold.4141 64K FAILED uniquification error + metal short
9718ebb gold.4142 64K PASSED uniquified + orig size
4bc99c3 gold.4143 256K PASSED uniquified + 4M run
--------------------------------------------------------------------------
6ef7828 gold.285 256K FAILED M6 shorts
f370fd8 gold.286 256K PASSED using new fix-shorts script
--------------------------------------------------------------------------
I am in the process of filing a couple of git pulls to fix these problems, they should appear soon.