Theano/libgpuarray

Build fails on 32-bit architectures

Closed this issue ยท 16 comments

In my effort to resurrect the packaging of libgpuarray / pygpu for Debian, I noticed the software fails to build on 32-bit architectures. See the following page for the logs. The errors are of the following nature:

/<<PKGBUILDDIR>>/src/util/integerfactoring.c:271:20: error: '__int128' is not supported on this target
  return ((unsigned __int128)a * (unsigned __int128)b) % m;

If 32-bit builds are not supported, then it may be wise to at least update the Build Requirements section of the docs appropriately, and have CMake fail early and not even attempt a build if the platform is 32-bit. Otherwise, please advise on a workaround for the build.

Cheers,

I will admit that 32-bits builds are sort of a second-tier platform for us.

The main reason is that such build are of questionable utility since there are no CUDA runtime for 32-bits platform. However, there may be OpenCL builds so you could still do something useful.

I have no idea how to fix the build right now, but I'll look into it.

However, there may be OpenCL builds so you could still do something useful.

Indeed.

I have no idea how to fix the build right now, but I'll look into it.

Would it be possible to make the inclusion of 128-bit arithmetic conditional on whether the host platform has supports it?

I'm not sure. I'll take a look and figure it out.

Please have a look at the following bug report filed on the Debian BTS. There are potentially useful comments.

@obilaniu Fixed the problem with __int128 in #483

Once that is merged we will probably do a 0.6.9 soon-ish.

@ghisvail Why is my GCC inline assembler code broken? It certainly works when I try it. E.g.

gaIMulMod(0xBEEFDEADBEEFDEAD, 0xBEEFDEADBEEFDEAD, 0xDEADBEEFDEADBEEF) == 9006955852451488416

And if it were done in 64 bits that would totally blow up.

Moreover that codepath is thoroughly exercised by check_util_integerfactoring; If my modular arithmetic code broke then all the primality testing would also break, and so would the scheduler that I rely upon (because it tries to factor a problem of size N into the smallest M >= N with a factorization smooth enough to be split between block size, grid size and chunk size).

The x86-64 mul and div instructions multiply 64-bit-by-64-bit-to-128-bit and divide a 128-bit dividend by a 64-bit divisor to get a 64-bit quotient and remainder. If the divisor is 0 or the quotient > 2^64-1, SIGFPE is raised.

As the Intel SDMs say:

MUL r/m64  M  Valid  N.E.   Unsigned multiply (RDX:RAX โ† RAX โˆ— r/m64)
DIV r/m64  M  Valid  N.E.   Unsigned divide RDX:RAX by r/m64, with result stored in RAX โ† Quotient, RDX โ† Remainder

@obilaniu no idea, ask the submitter of the bug.

@abergeron When do you think 0.6.9 will be released? I'd like to run it on the 32-bit Debian builders to check whether additional issues are found.

@ghisvail Could you perhaps try building the Git version as it stands now? My commits have removed the __int128 error and quietened the other warnings in the Debian build logs but I'd be curious to see what other issues might crop up and fix them before @abergeron cuts 0.6.9.

I've started building privately on my machine with the extra flags I see in your build logs. The one thing I can see that will possibly spawn a huge amount of warnings is -Wdeclaration-after-statement; On GCC 6.2.1 I get thousands of warnings with that flag, all of them for files under tests/. The reason the i386 build logs don't show this is that the tests, if they're compiled, are compiled after the library, so it hadn't gotten there yet. To build the tests libcheck 0.11 is required.

Currently my flags are -Wall -Wextra -Wno-unused-parameter -Werror=format-security -Wdate-time -Wdeclaration-after-statement -fstack-protector-strong -D_FORTIFY_SOURCE=2 -std=gnu89.

I'd be curious to see what other issues might crop up and fix them before @abergeron cuts 0.6.9.

I can try out a dummy build on i386 and arm.

The reason the i386 build logs don't show this is that the tests, if they're compiled, are compiled after the library

In my case, the tests are not compiled since the builders don't have the necessary hardware to execute them. Unless an OpenCL implementation on the CPU, like pocl or mesa, could be used?

@ghisvail OpenCL is explicitly supported; But OpenCL implementations are highly variable in quality. It might be worth a shot.

I can confirm the build now succeeds for i386 and arm* architectures. Thanks @obilaniu.

But OpenCL implementations are highly variable in quality

Indeed, pocl made good strides and was successful in running the demanding test suite of arrayfire. I suspect things might be a bit worse for mesa though.

Besides, is there an automatic way to query the ID of the OpenCL device that needs to be passed to the test suite?

@ghisvail The env var to be passed to the testsuite is DEVICE=openclN, where N maps to the N'th device ID in the array returned by clGetDeviceIDs().

Command-line tools like clinfo can help you view the devices available to you. Odds are opencl0 will be it.

The syntax for OpenCL would be DEVICE=openclN:N, since we have to specify platform and device.

However if you don't have GPUs and single platform installed opencl0:0 should do the trick.

Thanks, I'll try this out on the new release. Do you have an ETA for it by any chance?

0.6.9 will probably be out today or tomorrow, unless we find major problems.