cornell-zhang/hcl-dialect

[Frontend] "Detached Operation Already Exists" Assertion Fails At Random

Closed this issue · 4 comments

Description

  • pytest randomly aborts running tests/mlir/test_schedule_streaming.py
  • 1 out of 10 runs of python tests/mlir/test_schedule_streaming.py would fail with assertion error:
llvm-project-14.0.0/mlir/lib/Bindings/Python/IRCore.cpp:955: static mlir::python::PyOperationRef mlir::python::PyOperation::createDetached(mlir::python::PyMlirContextRef, MlirOperation, pybind11::object): Assertion `liveOperations.count(operation.ptr) == 0 && "cannot create detached operation that already exists"' failed.
 #0 0x00007f22fbc9a05f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0

Relevant Tests

Test cases in tests/mlir/test_schedule_streaming.py
Especially test_move_outputs

This happens in separate_host_device python function. Precisely this line:

https://github.com/cornell-zhang/heterocl/blob/2e342b719719957375006d685f7e6199fa5c81a2/python/heterocl/mlir/schedule.py#L233

When the ret_zero ConstantOp tries to build.

Reading this createDetached function here, this issue arises when the operation we try to build already exists in the current context.

This issue is randomly triggered when we create multiple modules in the same context. Since the context is the owner of MLIR types and attributes and keeps track of alive operations, and modules couldn't reference operations in each other, I think it makes sense to let different modules use different contexts.

I tried this and the issue was resolved.

Fixed by this commit in HeteroCL repo: 86f6ff366551314bdf8d1b6ba434f4bd1cf1a197