harvard-acc/ALADDIN

Does the memory in ALADDIN support broadcast mechanism?

cylinbao opened this issue · 4 comments

If an accelerator has multiple PEs and share a local buffer.
Does the buffer(memory) in ALADDIN able to support broadcast mechanism that I don't need to partition it to get correct behavior?

What I want to do is like this

int A[2];
int B[4];
int C[4];

for(i=0; i<2; i++)
     for(j=0; j<4; j++)
          C[j] = A[i] + B[j];

I want to unroll the second for-loop with the factor 4, but the data from array A is the same in that for-loop. As my testing result, I still need to partition all of the three arrays (A, B and C) for 4 times to get correct performance behavior. So I want to confirm that ALADDIN support the data broadcasting or not.
Thanks.

Just save A[i] into a local variable and use that in the innermost for loop.

for (i = 0; i < 2; i++) {
  int a = A[i];
  for (j = 0; j < 4; j++)
     C[j] = a + B[j];
}

The reason this works is because a is represented by a register, which can be concurrently read by as many operations as needed.

Ok. I see.
But I have another question here, does it have limit for the number of concurrently reads for a register in ALADDIN?

Nope - see my last sentence :)

Ohoh, I see.
Thanks a lot!