NVlabs/timeloop

Bypassing DRAM incurs unreasonably large number of scalar fills in SRAM

CaiTH0618 opened this issue · 7 comments

Hello.

I was trying to make Inputs and Outputs bypass DRAM (It makes sense to let intermediate results stay on-chip between layers) , but the simulated energy efficiency was unreasonably large: pJ/Compute = 1011585881926010.750.

image

I checked the timeloop-model.stats.txt and found out the Scalar fills (per-instance) for Outputs mapped on GlobalBuffer was 18446744073709547520, which was expected to be 0 with my mapping. But the timeloop-model.map.txt was alright.

image

Then I undo the bypass setting for DRAM, and everything seemed to be normal: pJ/Compute = 8.226.

Can someone explain why? Thank you.

Below it's my yaml files:

architecture:
  version: 0.3

  subtree:
  - name: System
    
    local:
    - name: Dram
      class: DRAM
      attributes:
        type: LPDDR4
        width: 32
        block-size: 4
        word-bits: 8

    subtree:
    - name: Chip
      attributes:
        technology: 40nm

      local: 
      - name: GlobalBuffer
        class: SRAM
        attributes:  # 1MB
          depth: 32768
          width: 32
          block-size: 4
          word-bits: 8

      subtree:
      - name: Node

        local:
        - name: LocalBuffer
          class: SRAM
          attributes:  # 1KB
            depth: 256
            width: 32
            block-size: 4
            word-bits: 8

        subtree:
        - name: PE

          local:
          - name: RegFile
            class: regfile
            attributes:
              depth: 1
              width: 8
              block-size: 1
              word-bits: 8
          - name: MAC
            class: intmac
            attributes:
              datawidth: 8
problem:
  shape:
    name: MatMul
    dimensions: [ M, K, N ]
    data-spaces:
    - name: Weights
      projection:
      - [ [K] ]
      - [ [N] ]
    - name: Inputs
      projection:
      - [ [M] ]
      - [ [K] ]
    - name: Outputs
      projection:
      - [ [M] ]
      - [ [N] ]
      read-write: True

  instance:
    M: 128
    K: 16
    N: 32
mapping:

  # DRAM

  - target: Dram  # arch-constraint: Input & Output bypassing
    type: bypass
    keep: [Weights]
    bypass: [Inputs, Outputs]

  - target: Dram
    type: temporal
    factors: M=1 K=1 N=1
    permutation: KNM

  # GlobalBuffer
  
  - target: GlobalBuffer  # arch-constraint: Weight bypassing
    type: bypass
    keep: [Inputs, Outputs]
    bypass: [Weights]

  - target: GlobalBuffer
    type: temporal
    factors: M=4 K=1 N=1
    permutation: KNM

  # LocalBuffer

  - target: LocalBuffer  # arch-constraint: Output bypassing
    type: bypass
    keep: [Inputs, Weights]
    bypass: [Outputs]

  - target: LocalBuffer
    type: temporal
    factors: M=32 K=16 N=32
    permutation: KNM

  # RegFile

  - target: RegFile  # arch-constraint: OS dataflow
    type: temporal
    factors: M=1 K=1 N=1
    permutation: KNM

  - target: RegFile  # arch-constraint: OS dataflow
    type: bypass
    keep: [Outputs]
    bypass: [Inputs, Weights]

Those numbers indicate an integer error (probably an uninitialized value). Was this with HEAD? Or with v3.0?

It was with HEAD.

Ok. Here are the next steps:

  1. Please try it with v3.0 and let us know if it works. If it works I suggest you continue your work using that release.
  2. Whether it works with v3.0 or not, please upload a single concatenated .YAML file with the simplest possible configuration that reproduces the error. Please upload an actual YAML file instead of cutting-and-pasting its contents into a comment.

I have tried v3.0 and the same bug still exists.

bug.zip contains

  • bug.yaml which is the scaled-down case that still reproduces the error described above;
  • timeloop-model.stats.txt which contains the error output.

bug.zip

Could you please try the v3_outermost_fill_fix branch and let me know if it's fixed? Please leave the bug open until we merge this into master and v3.0.

It's OK now on my side. Thanks.

Fix pulled into new hotfix release v3.0.1 as well as master branch. Closing.