.compute_at() failed: "Sibling fusion strategy requires a specific memref"
Closed this issue · 6 comments
The following example shows the attempt to merge two independent loop bands.
module {
func @top() {
%0 = memref.alloc() : memref<32x32xf32>
%1 = memref.alloc() : memref<32x32xf32>
%2 = hcl.create_stage_handle "B" : !hcl.StageHandle
%3 = hcl.create_loop_handle "i" : !hcl.LoopHandle
%4 = hcl.create_loop_handle "j" : !hcl.LoopHandle
affine.for %arg0 = 0 to 32 {
affine.for %arg1 = 0 to 32 {
%9 = memref.load %0[%arg0, %arg1] : memref<32x32xf32>
%cst = constant 1.000000e+00 : f32
%10 = addf %9, %cst : f32
memref.store %10, %1[%arg0, %arg1] : memref<32x32xf32>
} {loop_name = "j"}
} {loop_name = "i", stage_name = "B"}
%5 = memref.alloc() : memref<32x32xf32>
%6 = hcl.create_stage_handle "C" : !hcl.StageHandle
%7 = hcl.create_loop_handle "i" : !hcl.LoopHandle
%8 = hcl.create_loop_handle "j" : !hcl.LoopHandle
affine.for %arg0 = 0 to 32 {
affine.for %arg1 = 0 to 32 {
%9 = memref.load %0[%arg0, %arg1] : memref<32x32xf32>
%cst = constant 1.000000e+00 : f32
%10 = addf %9, %cst : f32
memref.store %10, %5[%arg0, %arg1] : memref<32x32xf32>
} {loop_name = "j"}
} {loop_name = "i", stage_name = "C"}
hcl.compute_at(%2, %6, %7)
return
}
}
Basically, it computes B=A+1
and C=A+1
, which should be a RAR pattern and can be safely merged, but hcl-opt
throws an error.
LoopFusionUtils.h:79: mlir::FusionStrategy::FusionStrategy(mlir::FusionStrategy::StrategyEnum): Assertion `strategy != Sibling && "Sibling fusion strategy requires a specific memref"' failed.
Aborted
Can you take a look at it? @zzzDavid
Even I changed the compute rule to C=B+1
, the same error occurred.
This issue is caused by memref.load
and memref.store
, the dependency analysis utility function only took affine.load
and affine.store
into account, so the program thought it was a sibling fusion, and therefore this error. I'm fixing this soon.
I added dependency analysis support for memref.load
and memref.store
in b12607e, but compute_at still fails.
The problem is Affine's canFuseLoop
and FuseLoops
utility functions are built for affine.load
and affine.store
. We will need to enhance these loop utility functions for memref interface later.
For now, I suggest we generate affine.load
/affine.store
in the front-end.
I tried adding memref support in canFuseLoops
and FuseLoops
, but a new problem appears: a struct used to analyze memref access called MemRefAccess
only supports affine load/store: https://github.com/llvm/llvm-project/blob/0e19186c82a8e6b403788aa9f24752cbc3bb2dc9/mlir/lib/Analysis/Utils.cpp#L1227
Having of our own copy of MemRefAccess
implementation will require us to implement all classes/functions that are using it. I suggest we try generating affine.load
and affine.store
from the front-end.
Okay, I will later try if generating affine.load
and affine.store
is a workable solution.
I've change all the generated memref.load/store to affine.load/store in the commit ef07366c06192c780ae6a9067bc1c8b00b028af2, and the above example works.
However, the current solution is temporary, since MLIR still has no official support for Python binding of Affine dialect, and the Affine operators are generated by myself. There are also some bugs in the generated code that needs to be fixed manually.
For example, the AffineLoadOp binding lacks an AffineMap attribute (see LLVM issue). Thus, I directly copy the generated TableGen code and fix that issue in our repo, which is not that elegant.