Bypassing DRAM incurs unreasonably large number of scalar fills in SRAM
CaiTH0618 opened this issue · 7 comments
Hello.
I was trying to make Inputs and Outputs bypass DRAM (It makes sense to let intermediate results stay on-chip between layers) , but the simulated energy efficiency was unreasonably large: pJ/Compute = 1011585881926010.750
.
I checked the timeloop-model.stats.txt
and found out the Scalar fills (per-instance)
for Outputs
mapped on GlobalBuffer
was 18446744073709547520
, which was expected to be 0
with my mapping. But the timeloop-model.map.txt
was alright.
Then I undo the bypass setting for DRAM, and everything seemed to be normal: pJ/Compute = 8.226
.
Can someone explain why? Thank you.
Below it's my yaml files:
architecture:
version: 0.3
subtree:
- name: System
local:
- name: Dram
class: DRAM
attributes:
type: LPDDR4
width: 32
block-size: 4
word-bits: 8
subtree:
- name: Chip
attributes:
technology: 40nm
local:
- name: GlobalBuffer
class: SRAM
attributes: # 1MB
depth: 32768
width: 32
block-size: 4
word-bits: 8
subtree:
- name: Node
local:
- name: LocalBuffer
class: SRAM
attributes: # 1KB
depth: 256
width: 32
block-size: 4
word-bits: 8
subtree:
- name: PE
local:
- name: RegFile
class: regfile
attributes:
depth: 1
width: 8
block-size: 1
word-bits: 8
- name: MAC
class: intmac
attributes:
datawidth: 8
problem:
shape:
name: MatMul
dimensions: [ M, K, N ]
data-spaces:
- name: Weights
projection:
- [ [K] ]
- [ [N] ]
- name: Inputs
projection:
- [ [M] ]
- [ [K] ]
- name: Outputs
projection:
- [ [M] ]
- [ [N] ]
read-write: True
instance:
M: 128
K: 16
N: 32
mapping:
# DRAM
- target: Dram # arch-constraint: Input & Output bypassing
type: bypass
keep: [Weights]
bypass: [Inputs, Outputs]
- target: Dram
type: temporal
factors: M=1 K=1 N=1
permutation: KNM
# GlobalBuffer
- target: GlobalBuffer # arch-constraint: Weight bypassing
type: bypass
keep: [Inputs, Outputs]
bypass: [Weights]
- target: GlobalBuffer
type: temporal
factors: M=4 K=1 N=1
permutation: KNM
# LocalBuffer
- target: LocalBuffer # arch-constraint: Output bypassing
type: bypass
keep: [Inputs, Weights]
bypass: [Outputs]
- target: LocalBuffer
type: temporal
factors: M=32 K=16 N=32
permutation: KNM
# RegFile
- target: RegFile # arch-constraint: OS dataflow
type: temporal
factors: M=1 K=1 N=1
permutation: KNM
- target: RegFile # arch-constraint: OS dataflow
type: bypass
keep: [Outputs]
bypass: [Inputs, Weights]
Those numbers indicate an integer error (probably an uninitialized value). Was this with HEAD? Or with v3.0?
It was with HEAD.
Ok. Here are the next steps:
- Please try it with v3.0 and let us know if it works. If it works I suggest you continue your work using that release.
- Whether it works with v3.0 or not, please upload a single concatenated .YAML file with the simplest possible configuration that reproduces the error. Please upload an actual YAML file instead of cutting-and-pasting its contents into a comment.
I have tried v3.0 and the same bug still exists.
bug.zip
contains
bug.yaml
which is the scaled-down case that still reproduces the error described above;timeloop-model.stats.txt
which contains the error output.
Could you please try the v3_outermost_fill_fix branch and let me know if it's fixed? Please leave the bug open until we merge this into master and v3.0.
It's OK now on my side. Thanks.
Fix pulled into new hotfix release v3.0.1 as well as master branch. Closing.