Does the memory in ALADDIN support broadcast mechanism?
cylinbao opened this issue · 4 comments
If an accelerator has multiple PEs and share a local buffer.
Does the buffer(memory) in ALADDIN able to support broadcast mechanism that I don't need to partition it to get correct behavior?
What I want to do is like this
int A[2];
int B[4];
int C[4];
for(i=0; i<2; i++)
for(j=0; j<4; j++)
C[j] = A[i] + B[j];
I want to unroll the second for-loop with the factor 4, but the data from array A is the same in that for-loop. As my testing result, I still need to partition all of the three arrays (A, B and C) for 4 times to get correct performance behavior. So I want to confirm that ALADDIN support the data broadcasting or not.
Thanks.
Just save A[i] into a local variable and use that in the innermost for loop.
for (i = 0; i < 2; i++) {
int a = A[i];
for (j = 0; j < 4; j++)
C[j] = a + B[j];
}
The reason this works is because a
is represented by a register, which can be concurrently read by as many operations as needed.
Ok. I see.
But I have another question here, does it have limit for the number of concurrently reads for a register in ALADDIN?
Nope - see my last sentence :)
Ohoh, I see.
Thanks a lot!