ut-parla/Parla.py

PArray coherence issue

Opened this issue · 5 comments

When doing a reduce operation using PArrays, the data in follow-up operations is not consistent or None.
The following example without a reduction works fine:

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    for i in range(n):
        @spawn(t[n+i],dependencies=[t[:n]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

The outputs are 1.0, 1.0, 1.0, 1.0.

Adding a reduce operation leads to wrong values, often 0.5, 0.0, 0.0, 0.0

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    @spawn(a[0],dependencies=[t[:n]], input=[arr],placement=gpu)
    def acc():
        print(arr.mean())

   for i in range(n):
        @spawn(t[n+i],dependencies=[a[0]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

When the acc operation binds the parameter via inout the values are sometimes None and yield runtime exceptions. This only happens on GPU, not on CPU.

Thanks for reporting this issues. Fine grained slicing in this branch has known bug which has been fixed in experiment-parla but has not sync back to this repo yet. Will make a PR to bring the patch back

#150 is created which solves coherence bugs in PArray.

Whats more, your second exmaple doesn't work since it voilate the PArray's restriction that doesn't allow moving multiple overlapping subarrays at the system without a writeback. That also includes the same subarray on different device (e.g. task_a might create arr[0] on gpu 0 but task_b will read arr[0] to gpu 1). This is a TODO that will be supported in next parla release but not yet.

So to make you example work in current version, you need a writeback task that writeback task_a's changes to cpu before task_c begins. Which should be:

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    @spawn(a[0],dependencies=[t[:n]], inout=[arr],placement=gpu)  # here, inout is required to trigger writeback
    def writeback():
        pass # do nothing

   for i in range(n):
        @spawn(t[n+i],dependencies=[a[0]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

The changes have been merged into the main branch and the tutorial is also updated based on that. Will leave the issues as open for enhancement to get rid of the requirement of using writeback task.

The merge does not resolve the issue, even with a writeback. Some objects are still None and the output is essentially the same.
Also, I think someone might have left some debug messages in the code because the program now prints the following:

write: NA::subarray::[0]
read: NA::subarray::[2]
write: NA::subarray::[3]
read: NA::subarray::[0]
write: NA::subarray::[1]
read: NA::subarray::[3]
read: NA::subarray::[1]
write: NA::subarray::[2]
write: NA

My bad, has removed the debug string please pull the changes.

I have reproduced the bug and looks like there is still a bug in current version of parla but not in new parla. Will look at it.