
Improve `schedule/as_matmul` matching on hybrid loop-libop code

roastduck opened this issue · 1 comments

Current schedule/as_matmul can't match some basic cases when there are both loops and libop calls. Example:

for i in range(500):
  c[i] = a[i] @ ft.transpose(b, (1, 0))

This code cannot be mapped to a Matmul node because libop introduces intermediate local variables. The code is actually like this:

for i in range(500):
  t = ft.transpose(b, (1, 0))
  u = a[i] @ t
  c[i] = u

In order to deal with this case, we need two following changes:

  • Call inline on every variables inside the matching sub-tree. There will be no side-effect because the matching will fail otherwise. This will solve the problem of t in this example.
  • Implement a new schedule, maybe named anti_inline, that removes an intermediate variable an redirect all Stores to it to the final destination. We can implement anti_inline for only variables that are copied to another variable with modifications (like c[i] = u). These will also be no side-effect because the matching will fail otherwise. This will solve the problem of u.

After implementing the changes above, schedule/as_matmul is expected to match this code to a Matmul, but still can't deal with its derivatives. In order to accept such a code in AD, will need one more change:

  • Simultaneously match multiple Matmuls in one schedule/as_matmul call, instead of relying on #! prefer_libs to fission the loops.

Another approach to solve the problem (without considering AD) is to add a new schedule uncache and run it automatically inside as_matmul. As the name suggests, uncache undoes the cache schedule. uncache(v) detects whether v maps to a parent VarDef u, and replace v by u with corresponding indices.