cucapra/dahlia

A case for return values from functions

Closed this issue · 12 comments

fft-transpose rewrite uses functions with return values
https://github.com/cucapra/fuse-benchmarks/blob/ea7f4fe77b4822e76a256de9a6825c9240a55224/rewrite/machsuite-fft-transpose/fft.cpp#L173-L174

aes can benefit from ability to write functions with return values.
https://github.com/cucapra/fuse-benchmarks/blob/ea7f4fe77b4822e76a256de9a6825c9240a55224/baseline/machsuite-aes/aes.cpp#L53-L100

I think these provide an incentive to have functions with a single return value.

However, in the long run, returning a tuple might be useful

an example from fft-transpose
https://github.com/cucapra/fuse-benchmarks/blob/ea7f4fe77b4822e76a256de9a6825c9240a55224/baseline/machsuite-fft-transpose/fft.cpp#L39-L61

Passing an array to these functions require creating a local array and copying element by element. So an overhead free hardware function which can simply be inlined seems useful for modularity.

def c: ap_int<8> foo (a: ap_int<8>) {
    c := a;
}

...
let temp = foo(arr[i]);
---
arr[i] := temp;
...

is what I imagine it to look like. foo is combinational, and explicit sequencing to handle read write dependency. ie. arr[i] := foo(arr[i]) would throw an error.

Can you explain how the function signature should be read? What is c supposed to be?

My counter proposal would be to have the last value in the function automatically be the return value.

Also, are functions with retune values required to be combinational? If so why and what restrictions are imposed?

Can you explain how the function signature should be read? What is c supposed to be?

ap_int<8> foo (ap_int<8> a) {
ap_int<8> c = a;
return c;
}

My counter proposal would be to have the last value in the function automatically be the return value.

Not sure whether I follow that. What happens when the last argument is an array?
But I think that'll work for the two examples above.

Also, are functions with retune values required to be combinational? If so why and what restrictions are imposed?

No. And that's a slightly orthogonal thing.

Current functions which pass by array,

  • would need internal sequencing to use multiple elements (or use other features)
  • has an implied array copy phase?

But with that our language won't be able to express the same inlined design with functions.

int add (a, b) {
   return a + b;
}

...
let a = arr[i];
---
let b = arr[i+1];
---
arr[i+2] = add(a, b);
...

would be same as

...
let a = arr[i];
---
let b = arr[i+1];
---
arr[i+2] = a +b;
...

but

int add (rec[2]) {
   let a = rec[0];
   ---
   let b = rec[1];
   ---
   return a + b;
}

...
let a = arr[i];
---
let send[2] = { a, arr[i+1] };
---
arr[i+2] = add(send);
...

has more sequencing.

That seems more like an optimization. The compiler can eliminate the send array after inlining.

I’m more interested in understanding what the semantics will mean for actual hardware blocks and if we should talk about it now.

Otherwise I’m pretty sure inlining the block would get the same results (we should write an experiment and test the hypothesis)

I think return values would be really useful, if they're quite restrictive!

First, let's acknowledge that there are two kinds of arguments to Fuse functions: normal values, and memories. They're pretty different semantically (passed by value, and passed by reference), and they're different in the hardware model too (just sending wires into the module, and delegating access to a memory). But both are compatible with inlining semantics.

I don't think we should return memories. That doesn't make much sense. So IMO return values should work like "normal wire" arguments, but in reverse. And we should absolutely preserve inlining semantics—that is, a returned expression should be exactly the same as putting the output in a result variable and using that.

I think the natural syntax would be a suffixed type:

// A silly add function made slightly more complex for illustrative purposes.
def add(a: bit<32>, b: bit<32>): bit<32> {
    let v = a + b;
    return v - 0;
}

Then this:

def main() {
    let x = 1 * 2;
    mem[0] := add(x, 3 * 4) * 5;
}

Is equivalent to this:

def main_inlined() {
    let x = 1 * 2;

    // Inlined function call!
    let _add_a = x;
    let _add_b = 3 * 4;
    let _add_v = _add_a + _add_b;
    let _add_ret = _add_v - 0;

    mem[0] := _add_ret * 5;
}

If it's OK with you, @rachitnigam, I'll throw together a quick implementation of this. I think it's the only thing blocking a clean run of all benchmarks in Vivado HLS (namely, aes & fft-transpose really want functions).

Sure. Modulo my concerns below, implementing this seems practical for now.

I don’t understand the meaning of return values in hardware. In an LLVM backend, return values would be registered on one of the return registers which means that an HLS compiler will turn it into a memory.

So are all retuned values put into registers? Can we return local arrays? Structs?

The mental model I have is that the return value is "just a wire" (i.e., really a bundle of wires). So just like when you run any other function, you put values on the input wires, activate the submodule, and wait for a while for it to be done. But in a function with a return value, when it's done, it puts a value on its output wires for the calling module to read off.

So I'd say definitely no to local arrays, and yes to structs. (Recall that immutable structs are "plain" values—just bundles of bits that are logically decomposed into smaller values.)

The bundle of wire model might not work that well because implementing functions as real hardware modules (instead of inlining them) would probably require a ready valid signal which might register the values anyways.

True, but in a world where we have non-inlined (shareable) functions, I think the inputs inevitably need to be registered too—they'll also need a ready signal. So in that sense, the inputs and outputs behave similarly to each other in both settings (inlined and not).