cornell-zhang/hcl-dialect

.compute_at() failed: "Sibling fusion strategy requires a specific memref"

Closed this issue · 6 comments

The following example shows the attempt to merge two independent loop bands.

module  {
  func @top() {
    %0 = memref.alloc() : memref<32x32xf32>
    %1 = memref.alloc() : memref<32x32xf32>
    %2 = hcl.create_stage_handle "B" : !hcl.StageHandle
    %3 = hcl.create_loop_handle "i" : !hcl.LoopHandle
    %4 = hcl.create_loop_handle "j" : !hcl.LoopHandle
    affine.for %arg0 = 0 to 32 {
      affine.for %arg1 = 0 to 32 {
        %9 = memref.load %0[%arg0, %arg1] : memref<32x32xf32>
        %cst = constant 1.000000e+00 : f32
        %10 = addf %9, %cst : f32
        memref.store %10, %1[%arg0, %arg1] : memref<32x32xf32>
      } {loop_name = "j"}
    } {loop_name = "i", stage_name = "B"}
    %5 = memref.alloc() : memref<32x32xf32>
    %6 = hcl.create_stage_handle "C" : !hcl.StageHandle
    %7 = hcl.create_loop_handle "i" : !hcl.LoopHandle
    %8 = hcl.create_loop_handle "j" : !hcl.LoopHandle
    affine.for %arg0 = 0 to 32 {
      affine.for %arg1 = 0 to 32 {
        %9 = memref.load %0[%arg0, %arg1] : memref<32x32xf32>
        %cst = constant 1.000000e+00 : f32
        %10 = addf %9, %cst : f32
        memref.store %10, %5[%arg0, %arg1] : memref<32x32xf32>
      } {loop_name = "j"}
    } {loop_name = "i", stage_name = "C"}
    hcl.compute_at(%2, %6, %7)
    return
  }
}

Basically, it computes B=A+1 and C=A+1, which should be a RAR pattern and can be safely merged, but hcl-opt throws an error.

LoopFusionUtils.h:79: mlir::FusionStrategy::FusionStrategy(mlir::FusionStrategy::StrategyEnum): Assertion `strategy != Sibling && "Sibling fusion strategy requires a specific memref"' failed.
Aborted

Can you take a look at it? @zzzDavid

Even I changed the compute rule to C=B+1, the same error occurred.

This issue is caused by memref.load and memref.store, the dependency analysis utility function only took affine.load and affine.store into account, so the program thought it was a sibling fusion, and therefore this error. I'm fixing this soon.

I added dependency analysis support for memref.load and memref.store in b12607e, but compute_at still fails.

The problem is Affine's canFuseLoop and FuseLoops utility functions are built for affine.load and affine.store. We will need to enhance these loop utility functions for memref interface later.

For now, I suggest we generate affine.load/affine.store in the front-end.

I tried adding memref support in canFuseLoops and FuseLoops, but a new problem appears: a struct used to analyze memref access called MemRefAccess only supports affine load/store: https://github.com/llvm/llvm-project/blob/0e19186c82a8e6b403788aa9f24752cbc3bb2dc9/mlir/lib/Analysis/Utils.cpp#L1227

Having of our own copy of MemRefAccess implementation will require us to implement all classes/functions that are using it. I suggest we try generating affine.load and affine.store from the front-end.

Okay, I will later try if generating affine.load and affine.store is a workable solution.

I've change all the generated memref.load/store to affine.load/store in the commit ef07366c06192c780ae6a9067bc1c8b00b028af2, and the above example works.

However, the current solution is temporary, since MLIR still has no official support for Python binding of Affine dialect, and the Affine operators are generated by myself. There are also some bugs in the generated code that needs to be fixed manually.
For example, the AffineLoadOp binding lacks an AffineMap attribute (see LLVM issue). Thus, I directly copy the generated TableGen code and fix that issue in our repo, which is not that elegant.