cornell-zhang/heterocl

Stack Overflow for Recursive IR Traversal

Opened this issue · 1 comments

Due to the current IR structure based on Halide IR, we traverse the IR in a recursive way, which may lead to stack overflow if we have a deep IR. Even if we do not get a segfault, we might be occupying too much stack memory. One possible solution is by transforming the deep IR into a wide IR and apply iterative IR traversal.

Bump this up since we are running into the issue recently. The building and simulation process can consume surprisingly long time with increasing number of compute units (CU) in the input HCL program. If the number of CUs keeps increasing, at certain point, the building process will crash with SegFault due to stack overflow.

Reproducing the scalability issue

We were able to reproduce the scalability issue using KNN digitrec example; we increase the number of CUs in KNN digitrec to stress test HCL compiler. Here is some profiling results on a server-class linux machine and local windows desktop (using WSL environment). In the figure, the end of x axis (# of CUs) indicates the maximum number of CUs can be built without crashing.

scalability_brg_zhang_xcel
scalability_desktop

The SegFault issue can actually be alleviated by increasing the stack size using ulimit -s $STACK_SIZE. For example, after increasing the stack size from the default 8192KB to 65536KB, the HCL compiler can handle more compute unit instantiation in the input digirec KNN program. However, the build time is still super slow, and we need to come up with some solutions to make it faster.

scalability

How to fix the scalability issue?

As Sean mentioned, the recursive traversal in Halide IR Mutator is the root cause of the scalability issue. The best way to fix the issue is to redesign and replace the original Halide IR Mutator to avoid doing traversal recursively. However, if we are doing this, I would not expect the issue to be fixed in a short amount of time since the Halide IR Mutator is used almost in ever step in the lowering process; it may take a lot of efforts to ensure the new IR Mutator does not break the original flow.

I am connecting with @chhzh123 and @zzzDavid to discuss about the possible fix in the HCL-MLIR compilation flow.