cornell-zhang/hcl-dialect

[MLIR Limitation] LLVM backend only supports modulo on <=128-bit integers

zzzDavid opened this issue · 2 comments

This thread documents an LLVM limitation of the modulo operation. The llvm.remsi operation does not support integer wider than 128 bits.

Test case:

def test_mod_ui():
    hcl.init()

    def kernel():
        x = hcl.scalar(19, "x", dtype="uint128")
        y = hcl.scalar(5, "y", dtype="uint128")
        z = hcl.scalar(0, "z", dtype="uint128")
        z.v = (x.v + 5 - y.v) % z.v
    
    s = hcl.create_schedule([], kernel)
    print(hcl.lower(s))
    f = hcl.build(s)
    f()

Error message:

Compiling with translate -> llc -> gcc

undefined reference to `__umodei4'

JIT compilation:

JIT session error: Symbols not found: [ __umodei4 ]
Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
Failed to materialize symbols: { (main, { top, _mlir_top, _mlir__mlir_ciface_top, _mlir_ciface_top }) }------------------------------------------------------------------------

The same issue was raised before on the Halide-based HeteroCL, although the error message looks different:

cornell-zhang/heterocl#353

Bypassed by commit: cornell-zhang/heterocl@6a3015b

The frontend raises a warning on modulo operation on integer > 128 bits, and the bitwidth is capped at 128 bits.

Output of the above example:

[Data Type] Modulo only supports integer <= 128 bits
  warnings.warn(self.message, category=self.category)
module {
  func.func @top() attributes {itypes = "", otypes = ""} {
    %c0 = arith.constant 0 : index
    %0 = memref.alloc() {name = "x"} : memref<1xi128>
    %c19_i32 = arith.constant 19 : i32
    %1 = arith.extsi %c19_i32 {unsigned} : i32 to i128
    affine.store %1, %0[%c0] {to = "x", unsigned} : memref<1xi128>
    %2 = memref.alloc() {name = "y"} : memref<1xi128>
    %c5_i32 = arith.constant 5 : i32
    %3 = arith.extsi %c5_i32 {unsigned} : i32 to i128
    affine.store %3, %2[%c0] {to = "y", unsigned} : memref<1xi128>
    %4 = memref.alloc() {name = "z"} : memref<1xi128>
    %c0_i32 = arith.constant 0 : i32
    %5 = arith.extsi %c0_i32 {unsigned} : i32 to i128
    affine.store %5, %4[%c0] {to = "z", unsigned} : memref<1xi128>
    %6 = affine.load %0[0] {from = "x", unsigned} : memref<1xi128>
    %7 = arith.extui %6 : i128 to i130
    %8 = arith.extsi %c5_i32 : i32 to i130
    %9 = arith.addi %7, %8 : i130
    %10 = affine.load %2[0] {from = "y", unsigned} : memref<1xi128>
    %11 = arith.extsi %9 : i130 to i131
    %12 = arith.extui %10 : i128 to i131
    %13 = arith.subi %11, %12 : i131
    %14 = affine.load %4[0] {from = "z", unsigned} : memref<1xi128>
    %15 = arith.trunci %13 : i131 to i128
    %16 = arith.remsi %15, %14 : i128
    affine.store %16, %4[0] {to = "z", unsigned} : memref<1xi128>
    return
  }
}