NVlabs/timeloop

Mapping does not match constraints

SkyeLantian opened this issue · 7 comments

Hello, I met some problems when I use the mapper function. I want to get the mapping and performance of the accelerator that adopts the output stationary and 16 inputs are multicast vertically and 16 kernels of weights are multicast horizontally into the PE arrary.

The constraint is like this:

mapspace:
constraints:
targets:
#Datatype Bypass

  • target: RegFile
    type: bypass
    keep: [Inputs, Weights, Outputs]

  • target: GlobalBuffer
    type: bypass
    keep: [Inputs, Weights, Outputs]

  • target: DRAM
    type: bypass
    keep: [Inputs, Weights, Outputs]

Temporal

  • target: RegFile
    type: temporal
    factors: R=3 S=3 C=64 P=1 Q=1 K=1 N=1
    permutation: RSCPQKN
  • target: GlobalBuffer
    type: temporal
    factors: R=1 S=1 C=1 P=50 Q=50 K=8 N=1
    permutation: RSCPQKN
  • target: DRAM
    type: temporal
    factors: K=1 N=1
    permutation: KN

Spatial

  • target: GlobalBuffer
    type: spatial
    factors: K=16 N=16 S=1 R=1 C=1
    permutation: NRSCK
    split: 1

The mapping is like this:
| for Q in [0:5)
| for C in [0:2)
| for S in [0:3)

GlobalBuffer [ Weights:12288 (12288) Inputs:266240 (266240) Outputs:1024000 (1024000) ]

| for P in [0:25)
| for Q in [0:10)
| for C in [0:2)
| for C in [0:2) (Spatial-Y)
| for R in [0:3) (Spatial-Y)
| for K in [0:2) (Spatial-Y)
| for N in [0:8) (Spatial-X)
| for P in [0:2) (Spatial-X)

RegFile [ Inputs:16 (16) Outputs:128 (128) ]

| for N in [0:2)
| for K in [0:64)
| for C in [0:8)

I think the spatial part should be: for K in [0:16) (Spatial-Y) fot N in [0:16) (spatial-X). Besides, for RegFile part, the loop bound is beyond the factors I set.

May I ask why the mapping does not match the constraint?

Could you please attach a single YAML with the entire concatenated config (arch, problem, constraints)?

dataflow
VGG.txt
This is the dataflow I want to simulate and the configs

Your constraints have incorrect YAML indentation. The key nesting is also not right. Just use mapspace_constraints -> targets.

Once I fix that, Timeloop is not able to find any legal mappings that satisfy your constraints. This is because your mapping exceeds the RF capacity.

Thanks for your help. I still have a problem about the imperfect factorization. If a layer have 1000 kernels, I want to multicast 16 kernels to the PE array and 8 kernels to the PE array at the last iteration. Can I use factors: K=16,8 in the spatial constraint. I tried the constraint like this, but the Timeloop shew 1000 is not divisible by product of 16
VGG.txt

The master branch supports imperfect factorization only in timeloop-model with a user-specified mapping. Imperfect factorization with timeloop-mapper is only supported in the ruby branch. Could you give that a shot?

Sorry, may I ask what the ruby branch is?

It's the imperfect-factorization mapper. The work was led by Mark Horeni at the University of Notre Dame and published in ISPASS 2022. https://research.nvidia.com/publication/2022-06_ruby-improving-hardware-efficiency-tensor-algebra-accelerators-through

We are still working on integrating it into the main branch.