Improve `schedule/as_matmul` matching on hybrid loop-libop code
roastduck opened this issue · 1 comments
Current schedule/as_matmul
can't match some basic cases when there are both loops and libop
calls. Example:
for i in range(500):
c[i] = a[i] @ ft.transpose(b, (1, 0))
This code cannot be mapped to a Matmul
node because libop
introduces intermediate local variables. The code is actually like this:
for i in range(500):
t = ft.transpose(b, (1, 0))
u = a[i] @ t
c[i] = u
In order to deal with this case, we need two following changes:
- Call
inline
on every variables inside the matching sub-tree. There will be no side-effect because the matching will fail otherwise. This will solve the problem oft
in this example. - Implement a new schedule, maybe named
anti_inline
, that removes an intermediate variable an redirect allStore
s to it to the final destination. We can implementanti_inline
for only variables that are copied to another variable with modifications (likec[i] = u
). These will also be no side-effect because the matching will fail otherwise. This will solve the problem ofu
.
After implementing the changes above, schedule/as_matmul
is expected to match this code to a Matmul
, but still can't deal with its derivatives. In order to accept such a code in AD, will need one more change:
- Simultaneously match multiple
Matmul
s in oneschedule/as_matmul
call, instead of relying on#! prefer_libs
to fission the loops.
Another approach to solve the problem (without considering AD) is to add a new schedule uncache
and run it automatically inside as_matmul
. As the name suggests, uncache
undoes the cache
schedule. uncache(v)
detects whether v
maps to a parent VarDef
u
, and replace v
by u
with corresponding indices.