m4rs-mt/ILGPU

Passing Int128 as kernel parameter is not working

MoFtZ opened this issue · 3 comments

MoFtZ commented

I expected the new Int128 data type to just work, even if not necessarily performant. However, I have found an unexpected issue.

The following kernel does not work on Cuda, and generates the wrong output:

class Program
{
    static void MyKernel(Index1D index, ArrayView<Int128> dataView, Int128 constant)
    {
        dataView[index] = index.X + constant;
    }

    static void Main()
    {
        using var context = Context.CreateDefault();
        foreach (var device in context)
        {
            using var accelerator = device.CreateAccelerator(context);
            var kernel = accelerator.LoadAutoGroupedStreamKernel<
                Index1D, ArrayView<Int128>, Int128>(MyKernel);

            using var buffer = accelerator.Allocate1D<Int128>(1024);
            kernel((int)buffer.Length, buffer.View, 42);

            var data = buffer.GetAsArray1D();
            for (int i = 0, e = data.Length; i < e; ++i)
            {
                if (data[i] != 42 + i)
                    Console.WriteLine($"Error at element location {i}: {data[i]} found");
            }
        }
    }
}

Expected Output:

data[0] = { Lower = 42, Upper = 0 }
data[1] = { Lower = 43, Upper = 0 }
data[2] = { Lower = 44, Upper = 0 }
etc

Actual Output on Cuda:

data[0] = { Lower = 0, Upper = 42 }
data[1] = { Lower = 1, Upper = 42 }
data[2] = { Lower = 2, Upper = 42 }
etc

@MoFtZ thanks for reporting this. A quick investigation revealed that it looks like that 64bit additions with carry are not mapped properly to the data structure.

MoFtZ commented

@m4rs-mt It currently looks like an issue with the kernel launcher marshaling of the Int128 parameter. When performing Int128 operations within the kernel, it appears to work as expected. However, when using the supplied kernel parameter, the issue appears.

MoFtZ commented

@m4rs-mt OK, so I have confirmed that this is definitely to do with the kernel parameter marshaling.

ILGPU is not taking into account any structure padding/alignment.