# of reductions counts not as expected
hqjenny opened this issue · 0 comments
The total number of reductions currently is not equal to the theoretical total number of # elementwise ops (theoretical minimal) - # of outputs elements
when we summed up the reductions.
Here are two examples:
- Temporal reduction enabled at both Register and GlobalBuffer level
MainMemory [ Weights:3072 (3072) Inputs:576 (576) Outputs:512 (512) ]
---------------------------------------------------------------------
| for P in [0:1)
GlobalBuffer [ Weights:3072 (3072) Inputs:576 (576) Outputs:512 (512) ]
-----------------------------------------------------------------------
| for C in [0:32)
| for K in [0:2)
| for R in [0:3)
| for K in [0:16) (Spatial-X)
RegisterFile [ Weights:1 (1) Inputs:16 (16) Outputs:16 (16) ]
-------------------------------------------------------------
| for P in [0:16)
Given this mapping for Timeloop tutorial exercise 4. The # of temporal reductions at the RegisterFile level (for each instance) is 3040
. It is calculated as 3072(content_accesses)+0(peer_accesses)-32 (partition_size)=3040
. The content accesses are (P=16)*(R=3)*(K=2)*(C=32)=3072
. The total # of reductions at the RegisterFile is 3040(reductions per instances)*16(instances)=48640
. It would be correct if we assume there is no reduction capability at the GlobalBuffer level, but the # of temporal reductions at the Globalbuffer level is 15872
meaning there is reduction enabled at the GlobalBuffer level.
Therefore, the number of reductions at the RegisgterLevel is incorrect.
- Mapping that introduces spatial reductions
MainMemory [ Weights:3072 (3072) Inputs:576 (576) Outputs:512 (512) ]
---------------------------------------------------------------------
| for P in [0:1)
GlobalBuffer [ Weights:3072 (3072) Inputs:576 (576) Outputs:512 (512) ]
-----------------------------------------------------------------------
| for C in [0:2)
| for K in [0:32)
| for R in [0:3)
| for C in [0:16) (Spatial-X)
RegisterFile [ Weights:1 (1) Inputs:16 (16) Outputs:16 (16) ]
-------------------------------------------------------------
| for P in [0:16)
The reported # of spatial reductions for this mapping is 15360
. The # of temporal reduction at the RegisterFile level is 2560
. At the GlobalBuffer level is 512
. If we use the following formulation to calculate the # of reductions: 2560*16+512+15360=56832
. There will be more reductions needed than the minimum.