cucapra/dahlia

Passing by value versus constant reference

cgyurgyik opened this issue · 2 comments

I was looking through some of the Dahlia tests and noticed this.

#include "parser.cpp"
/***************** Parse helpers  ******************/
/***************************************************/
void kernel(vector<vector<double>> a0_0, vector<vector<double>> b0_0, vector<vector<double>> result0_0) {
  
  for(int i = 0; i < 2; i++) {
    for(int j = 0; j < 2; j++) {
      result0_0[i][j] = (a0_0[i][j] + b0_0[i][j]);
    }
  }
  json_t __;
  __["a0_0"] = a0_0;
  __["b0_0"] = b0_0;
  __["result0_0"] = result0_0;
  std::cout << __.dump(2) << std::endl;
}

Specifically:

void kernel(vector<vector<double>> a0_0, vector<vector<double>> b0_0, vector<vector<double>> result0_0) {

Here, we're passing a copy of a0_0 and b0_0 rather than a way to read the vector (i.e. const vector<vector<double>>& a0_0), which doesn't seem necessary. Given a large enough a0_0, this can also be quite expensive.

Similar for result0_0, but instead we want to write to it so something like vector<vector<double>>& result0_0 is appropriate.

I'm assuming this is just a result of not being able to differentiate whether a piece of memory is read-only or not, but I thought it was worth noting.

It is true that we don't have a distinction between input and output memories in Dahlia, but the underlying reason for this even broader. Dahlia can generate C++ in two modes:

  • For CPU-side testing.
  • For Vivado HLS.

In testing (which is the code you're looking at here), speed is not a big concern because we're not modeling any manner of performance—only correctness. And the tests run very fast compared to hardware simulation.

For HLS, performance certainly matters, but of course the interface to the function is not as simple as "copying a vector"; the HLS tool has to generate an actual hardware interface. We generate pragmas and stuff in this mode to tell VHLS what hardware to generate, and I don't think we end up using STL vector IIRC—we use whatever it takes to induce VHLS to produce the hardware interface we want.

Ok that makes sense. I agree, if this is testing only, then there is no need to get fancy, since tests are usually small anyways.