Multiplying two matrices - not working in 3D range
ZahariStoyanov opened this issue · 1 comments
Currently working on a Java library for matrix operations and ML. I use Aparapi to utilize the GPU.
I've written this code to multiply two matrices:
public static NDmatrix matMul(float[][] a, float[][] b) {
int[] aDim = new int[]{a.length, a[0].length};
int[] bDim = new int[]{b.length, b[0].length};
if(aDim[1] == bDim[0]){
int[] Dim = new int[]{aDim[0], bDim[1]};
int aVSize = aDim[0] * aDim[1];
float[] aVector = new float[aVSize];
for(int i = 0; i < aDim[0]; i++)
System.arraycopy(a[i], 0, aVector, i * aDim[1], aDim[1]);
int bVSize = bDim[0] * bDim[1];
float[] bVector = new float[bVSize];
for(int i = 0; i < bDim[0]; i++)
System.arraycopy(b[i], 0, bVector, i * bDim[1], bDim[1]);
int resVSize = Dim[0] * Dim[1];
float[] resVector = new float[resVSize];
int d[] = new int[]{aDim[1]};
Kernel mKernel = new Kernel() {
final int ht = Dim[0];
final int wt = Dim[1];
final int dpt = d[0];
public void run() {
int c = getGlobalId(0);
int r = getGlobalId(1);
int l = getGlobalId(2); // used with 3D range
localBarrier(); // used with 3D range
//for(int l = 0; l < dpt; l++) // used with 2D range
resVector[r * wt + c] += aVector[r * dpt + l] * bVector[l * wt + c]; // in 3D range, this only works for l = 0
}
};
mKernel.setExplicit(true);
mKernel.put(aVector);
mKernel.put(bVector);
mKernel.put(resVector);
mKernel.put(Dim);
mKernel.put(d);
mKernel.execute(Range.create3D(Dim[1], Dim[0], d[0]));
//mKernel.execute(Range.create2D(Dim[1], Dim[0]));
mKernel.get(resVector);
mKernel.dispose();
return new NDmatrix(Dim, resVector, null);
}
System.out.println("The number of columns in left matrix and number of rows in right matrix do not match.");
System.out.println();
return null;
}
However, it looks like resVector[some_index] only gets updated once(r + aVector[0 * dpt + 0] * bVector[0 * wt + c]). If, instead, I use a 2D range and a loop(the commented bits in the code), it works correctly. What could be the reason for such behavior and how can I force it to work fully parallel?
Interestingly, I tried one thing that "worked" - after updating resVector[some_index], I called this.put(resVector). However, it then couldn't compile in OpenCL and ended up using Java's multi-threading instead, eventually resulting in a correct result.
So... I added this loop in conjunction with the 3D range and it works:
[...]
for(int i = 0; i < dpt; i++)
if(i == l) resVector[r * wt + c] += aVector[r * dpt + l] * bVector[l * wt + c];
[...]
Why would it need such a bizarre force? What am I missing?