cucapra/dahlia

Support for Constants and Dynamic Loops

Closed this issue · 4 comments

Hi,

I'm new to Dahlia, for comparison I was trying to port a HLS accelerated Travelling Salesperson from the Vitis Tutorials repo to Dahlia. I noticed that without support for constants and dynamic loops I wasn't able to port it to Dahlia.

I can try to implement this and raise a PR. Let me know what you think

Idea for Constants

The code could look something like this

const N = 2;

decl a: float[N][N];
decl b: float[N][N];
decl result: float[N][N];

for (let i = 0..N) {
  for (let j = 0..N) {
    result[i][j] := a[i][j] + b[i][j];
  }
}

This could be implemented as a new expression type in Syntax.scala, and can be removed by a pass

case class EConst(c: String) extends Expr

Idea for Dynamic Loops

Not sure how this can be implemented or the challenges that would arise for this one.

for (let k=0..10) {
  for (let j=0..k) {
  }
}

Hi @Mark1626! Thanks for trying to port the design to Dahlia. We do actually support dynamic while loops already so you should be able to write:

while (j < k) {
  body;
  j++
}

The downside is that Dahlia will not allow you to unroll the loop because the type system cannot reason about the stride pattern.

These kinds of "triangular loops" (because the iteration space is along the j=i diagonal) can also be implemented in this way:

for (let k = 0..10) {
  for (let j = 0..10) {
    if (j < k) { body }
  }
}

The upside of this approach is that you can unroll both the loops in this case.

Let us know if this solves your problem!

I managed to port it, I'm having latency issues compared to the original version. The travelling saleperson example from Vitis Tutorial generates possible permutations, calculates the distance of each path and finds the shortest path. It utilises pipelining and a ram_1wnr is binded to the variable distance to achieve desired throughput. Without the ram_1wnr there is a pipelining issue in my implementation, I'm trying to see if I can fix it in my Dahlia source itself.

This is the current distance function.

def get_distance(perm: ubit<32>[10], distance: ubit<16>[10][10]): ubit<32> = {
    let ret: ubit<32> = 0;
    for (let i=0..9) {
        ret += (distance[perm[i]][perm[i+1]] as ubit<32>);
    }
    return ret;
}

Main pipelined loop

while (i < fact_n) pipeline {
    candidate := min(candidate, compute(i, distances));
    i += 1;
}

The access pattern is close to random access, and subsequent iterations also would be accessing the variable distances.

Link to my implementation https://github.com/Mark1626/dahlia-cookbook/tree/master/travelling-salesperson

I could convert it like this and try to use unroll, but this didn't work because of the access pattern. The ARRAY_PARTITION has to be complete rather than cyclic for this scenario as well.

def get_distance(perm: ubit<32>[10], distance: ubit<16>[10][10]): ubit<32> = {
    let ret: ubit<32> = 0;
    for (let i=0..10) {
      if (i < 10) {
        ret += (distance[perm[i]][perm[i+1]] as ubit<32>);
      }
    }
    return ret;
}

https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/Array-Partitioning


I found other issues as well, listing them below with snippets to reproduce the problem.

Environment info

git.status = dirty
git.hash = 806c53c
  1. Parsing large Literals
let max_uint32: ubit<32> = 0xffffffff;
// let max_uint32: ubit<32> = 4294967295;

This throws the following error.

[Error] For input string: "ffffffff"
  1. --header does not wrap the top level function with extern C. I was running CSim and CoSim in Vitis HLS, and I needed a header to import the kernel within my TB.

  2. Scalar result (I need to verify the case again)

decl a: bit<32>[1024];
decl sum: bit<32>;

let acc: bit<32> = 0;

for (let i=0..1024) {
  acc += a[i] + 1;
}

sum := acc;

generates the following HLS code

// git.status = dirty, build.date = Fri Dec 22 16:55:54 IST 2023, git.hash = 806c53c
#include <ap_int.h>
extern "C" {
  void kernel(ap_int<32> a[1024], ap_int<32> sum) {
    #pragma HLS INTERFACE m_axi port=a offset=slave bundle=gmem
    #pragma HLS INTERFACE s_axilite port=a bundle=control
    #pragma HLS INTERFACE s_axilite port=sum bundle=control
    #pragma HLS INTERFACE s_axilite port=return bundle=control
    ap_int<32> acc = 0;
    for(int i = 0; i < 1024; i++) {
      #pragma HLS UNROLL factor=1 skip_exit_check
      #pragma HLS LOOP_FLATTEN off
      acc += (a[i] + 1);
    }
    sum = acc;
  }
}

however the variable sum needs to be called by reference ap_int<32> &sum to get the return value.

@rachitnigam any comments?

Closing this as the points I've raised were fixed in later commits