StanfordAHA/garnet

Cannot configure the PE reg mode when it's stalled

Kuree opened this issue · 12 comments

Kuree commented

image

Not sure where to put this. Could be lassen related.

It seems like when the stall signal is high, the reg mode register's clk_en stays low, hence cannot be written during configuration. As a result, when we stall the chip during configuration, the PE core is not responsive.

This needs to be fixed ASAP.

This is the root cause for this bug report: #534

@Kuree to clarify, is it (a) that the "mode configuration register" can not be written to during configuration time? or (b) that the PE data registers can not be written to during configuration? (for reg const mode).

If it is (a) that seems really weird, b/c that is a config register like the rest (e.g. writing to opcode). (b) seems possible given the code.

Kuree commented

It's (b). Sorry for the confusion; I use the name from the verilog. The configuration is trying to write a constant to the operand register but failed to do so.

No worries, thanks for clarifying. I think I know the source, I will take a look.

Looking at the lassen code, looks like highest priority for writing to the register is the config_we bit. If that value is high, the register should be written with config_data (independent of clk_en). https://github.com/StanfordAHA/lassen/blob/53ebc50d18b6ccb4c72f8cc904cbb9a785418279/lassen/mode.py#L28

Kuree commented

The config_we seems to be low all the time.

@rsetaluri can you check how garnet is wiring up the config_we (PE.config_en) and config_data for the PEs? Ill verify that config_we is appropriately routed from PE interface to RegMode

@Kuree whats the config address? looks like the two data registers are assigned to address 0x3 and will only be written to when config_addr[2:0] == 3'h3

Kuree commented
00000405 009DDC00
01000405 48000200
02000405 00000000
Kuree commented

@rdaly525 I think the issue might be somewhere else. I will keep looking.

Well there are technically two registers. The Mode register inside the PE and the const initialization config register sitting outside the PE.

When Mode is "Const", it uses the value of the config register.
When Mode is "Delay", it uses the value of the mode register.

In order to read/write from/to the mode register, low bits of the address needs to be 3. Although given what I said it is rather pointless to write to the mode register in its current form.

is it the hardcoding of those addresses that's the problem? If so it seems we should always have that problem, independent of stall or not.

@Kuree are you able to write to the register in non-stall mode? (if that is even possible)

Kuree commented

I think I found the root cause of the problem.

When the application starts, we assert a soft reset signal that clears out the pipeline registers that's been used in the design. The counter in the design has enable over reset. During the stalling phase, the pipeline register is always x since they never takes any signal. Since counter is a self loop, the enable signal is computed with x. As a result, x is locked inside the routing fabric and we are never able to reset the pipeline registers.

Here are two possible fixes:

  1. Have compiler that generates reset over enable counter
  2. Have PE handle x properly.

Given the time frame we have I think we should just use software to fix this bug.