cornell-zhang/hcl-dialect

[Frontend][API] Support `hcl.print`

Closed this issue · 13 comments

We need to support print operation in HeteroCL for both HLS and LLVM backends. The basic idea is to build a loop nest to print out every element in a memref.

Relevant test case: test_codegen_vhsl.py::test_print

See the Toy language tutorial. May be useful for defining our own print operation.

Need frontend support. Also, it seems the object to be printed is not necessarily a memref. It can be a single scalar.

I'm thinking we could support printing scalar and tensor slices in the frontend: we create a memref from scalar and tensor slices, and then keep the memref interface of the PrintOp

Yep, you can test if this can work.

I'm here to document a change:

Currently, the PrintOp is lowered to scf.for loop nest that calls printf to print out each memref element in the lower-to-llvm pass. We can support MLIR types with this implementation, but not fixed-point number printing. Fixed-point numbers are lowered to integers before we lower to LLVM, so when we print the result, it will be fixed-point number's base integer. For example, 4.25 in hcl.Fixed(4.2) will be printed as 4.25 * 2**2 = 17.

I will move the PrintOp lowering from the lower-to-llvm pass to a separate pass, so that we can have fixed-point fraction bits information to print out fixed-point numbers correctly.

The original HCL supports passing a formatting string to hcl.print. Since we can't pass string as input argument to MLIR operations, I'm adding the format string as an attribute to PrintOp.

I met an issue after moving print lowering to a separate pass. Although the PrintOp can be lowered to for loops, the function call to printf couldn't pass verifier.

error: `std.call` op `printf` does not reference a valid function

The current implementation supports following printing functions:

  • Print a tensor
  • Print a number
  • Print an expression
  • Print with a format string

The one case still missing is printing a tuple. This will be added in the future.

Also printing a tensor slice needs to be supported. This is a bit trickier since A[0] is not actually built when calling the kernel function. It only returns a TensorSlice instance.

A = hcl.placeholder((10, 32))
def kernel(A):
    hcl.print(A[0])

We can support such cases, my implementation first allocates a new memref, then saves the elements loaded from a tensor to the new memref, and then prints the memref:

A = hcl.placeholder((10,))


def kernel(A):
    hcl.print(A[5])
module {
  func @top(%arg0: memref<10xi32>) attributes {extra_itypes = "s", extra_otypes = ""} {
    %c5 = arith.constant 5 : index
    %0 = affine.load %arg0[5] {from = "compute_0"} : memref<10xi32>
    %1 = memref.alloc() {name = "scalar_0"} : memref<1xi32>
    %c0 = arith.constant 0 : index
    affine.store %0, %1[0] {to = "scalar_0"} : memref<1xi32>
    hcl.print(%1) : memref<1xi32>
    return
  }
}

Yeah, no hurry to do that. We can come back to fix it later.

Currently hcl.print is lowered to C function printf inside loops. However, the print result doesn't have proper format, and the order of printed number is not correct when there are more than one hcl.print.

I think we can use or implement printing functions in C and call it in MLIR, such as:
https://github.com/llvm/llvm-project/blob/08860f525a2363ccd697ebb3ff59769e37b1be21/mlir/lib/ExecutionEngine/RunnerUtils.cpp#L97-L100

Up to this commit 42eaa5e, the print facility has been implemented. We have two operations hcl.print and hcl.print_memref. hcl.print supports variadic input, format string, and is implemented with C's printf under the hood. hcl.print_memref uses the printing functions defined in MLIR's runner utility.

Refer to these test cases for usage example:
https://github.com/cornell-zhang/hcl-dialect-prototype/blob/main/test/Operations/print/print.mlir