fjarri/reikna

ctypes incorrectly rendering np.int64 to "long" instead of "long long"

drtpotter opened this issue · 4 comments

Hi Bogdan,

I've encountered an issue with reikna.cluda.dtypes.ctype on Windows when using the CUDA api ( reikna.cluda.cuda_api()). The code

import numpy as np
from reikna.cluda.dtypes import ctype as ctype
print(ctype(np.int64))

prints "long", which is fine for compiling kernels using the Reikna OpenCL api on Windows and both OpenCL and CUDA Reikna api's on Linux. However when I use the Microsoft Visual Studio compiler with the CUDA api on Windows, then ctype(np.int64) should evaluate to "long long" instead of "long" because the Visual Studio compiler interprets "long" as a 4 byte integer.

Here is some code where this issue causes an incorrect result when using the Reikna CUDA api on Windows.

import numpy as np
from reikna import cluda
from reikna.cluda.dtypes import ctype as ctype
from reikna.cluda import functions

# Make up the API for any Platform
api=cluda.cuda_api()

# Get the available platforms
device=api.get_platforms()[0].get_devices()[0]
print(device)

# Make up the thread
thr=api.Thread(device)

# Shape of the simple array
shape=(2,2)
local_size=(1,1)
global_size=shape

# Make an array on the compute device
I64_dev=thr.array(shape=shape, dtype=np.int64)

# Make up a list of datatypes and functions to be rendered in the kernel
render_kwds=dict(i32_t=ctype(np.int32), 
                 i64_t=ctype(np.int64))

# Compile the source code for a number of kernels
program=thr.compile('''
    // standard kernel with different data types and template rendering
    KERNEL void test ( 
                        GLOBAL_MEM ${i64_t}* I64,
                        ${i64_t} value,
                        ${i32_t} len0,
                        ${i32_t} len1) { 
        
        // i0, i1 and i2 represent the coordinates in C
        SIZE_T i0=get_global_id(0); 
        SIZE_T i1=get_global_id(1); 
        
        // Only run the compute if it falls within a certain range
        if ((i0<len0) && (i1<len1)) {
            
            // Compute the 1D offset into the data using dot product striding notation
            SIZE_T offset=i0*len1+i1;
            
            // Test set of arguments
            I64[offset]=value; 
        }
    } 
''', render_kwds=render_kwds, compiler_options=[], fast_math=True)

# Execute the code 
program.test(   I64_dev,
                np.int64(10),
                np.int32(shape[0]),
                np.int32(shape[1]),
                # Specify the local size and global size at runtime
                local_size=local_size,
                global_size=global_size
             )

print(I64_dev.get())
    
# Release resources
thr.release()

I am using Microsoft Visual Studio 14.0 on Windows 10 with CUDA 9.2, and I get all zeros in the result. If I then switch to the OpenCL api I get the expected result of 10 in each element of the array.

Actually there is already code that should return the correct type, but there is a typo in it - it checks the value of platform.system instead of platform.system(). Before I fix it, could you check if import platform; platform.system() returns "Windows" for you? I don't have a Windows machine handy.

Should be fixed by commit cac7e39, try it out.

Ok, closing for now, please reopen if there's still a problem.