Mapping does not match constraints

Question

Mapping does not match constraints

SkyeLantian opened this issue a year ago · 7 comments

Hello, I met some problems when I use the mapper function. I want to get the mapping and performance of the accelerator that adopts the output stationary and 16 inputs are multicast vertically and 16 kernels of weights are multicast horizontally into the PE arrary.

The constraint is like this:

mapspace:
constraints:
targets:
#Datatype Bypass

target: RegFile
type: bypass
keep: [Inputs, Weights, Outputs]
target: GlobalBuffer
type: bypass
keep: [Inputs, Weights, Outputs]
target: DRAM
type: bypass
keep: [Inputs, Weights, Outputs]

Temporal

target: RegFile
type: temporal
factors: R=3 S=3 C=64 P=1 Q=1 K=1 N=1
permutation: RSCPQKN
target: GlobalBuffer
type: temporal
factors: R=1 S=1 C=1 P=50 Q=50 K=8 N=1
permutation: RSCPQKN
target: DRAM
type: temporal
factors: K=1 N=1
permutation: KN

Spatial

target: GlobalBuffer
type: spatial
factors: K=16 N=16 S=1 R=1 C=1
permutation: NRSCK
split: 1

The mapping is like this:
| for Q in [0:5)
| for C in [0:2)
| for S in [0:3)

GlobalBuffer [ Weights:12288 (12288) Inputs:266240 (266240) Outputs:1024000 (1024000) ]

RegFile [ Inputs:16 (16) Outputs:128 (128) ]

| for N in [0:2)
| for K in [0:64)
| for C in [0:8)

I think the spatial part should be: for K in [0:16) (Spatial-Y) fot N in [0:16) (spatial-X). Besides, for RegFile part, the loop bound is beyond the factors I set.

May I ask why the mapping does not match the constraint?

Answer 1 · 2023-04-18T17:00:07.000Z

Could you please attach a single YAML with the entire concatenated config (arch, problem, constraints)?

Answer 2 · 2023-04-19T02:24:10.000Z

VGG.txt
This is the dataflow I want to simulate and the configs

Answer 3 · 2023-04-19T15:15:20.000Z

Your constraints have incorrect YAML indentation. The key nesting is also not right. Just use mapspace_constraints -> targets.

Once I fix that, Timeloop is not able to find any legal mappings that satisfy your constraints. This is because your mapping exceeds the RF capacity.

Answer 4 · 2023-04-20T08:44:25.000Z

Thanks for your help. I still have a problem about the imperfect factorization. If a layer have 1000 kernels, I want to multicast 16 kernels to the PE array and 8 kernels to the PE array at the last iteration. Can I use factors: K=16,8 in the spatial constraint. I tried the constraint like this, but the Timeloop shew 1000 is not divisible by product of 16
VGG.txt

Answer 5 · 2023-04-20T14:07:28.000Z

The master branch supports imperfect factorization only in timeloop-model with a user-specified mapping. Imperfect factorization with timeloop-mapper is only supported in the ruby branch. Could you give that a shot?

Answer 6 · 2023-04-20T14:13:10.000Z

Sorry, may I ask what the ruby branch is?

Answer 7 · 2023-04-20T14:31:35.000Z

It's the imperfect-factorization mapper. The work was led by Mark Horeni at the University of Notre Dame and published in ISPASS 2022. https://research.nvidia.com/publication/2022-06_ruby-improving-hardware-efficiency-tensor-algebra-accelerators-through

We are still working on integrating it into the main branch.