PyGPU tests fail with cuLinkAddData: CUDA_ERROR_UNKNOWN
Closed this issue · 7 comments
Hello,
I'm on Ubuntu 16.04, and have a Geforce GTX 1080 with Cuda 8.0 runtime installed.
I also installed Cudnn 5.1.
My cuda libraries (and cudnn) are all found by the linker path.
/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080"
CUDA Driver Version / Runtime Version 9.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8114 MBytes (8508145664 bytes)
(20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1734 MHz (1.73 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS
I compiled manually libgpuarray using the instructions:
mkdir Build
cd Build
# you can pass -DCMAKE_INSTALL_PREFIX=/path/to/somewhere to install to an alternate location
cmake .. -DCMAKE_BUILD_TYPE=Release # or Debug if you are investigating a crash
make
make install
cd ..
# This must be done after libgpuarray is installed as per instructions above.
python setup.py build
python setup.py install --user
Running the C unit tests fails on the following tests:
DEVICE=cuda0 make test
Running tests...
Test project /home/user/development/libgpuarray/build
Start 1: test_types
1/11 Test #1: test_types ....................... Passed 0.00 sec
Start 2: test_util
2/11 Test #2: test_util ........................ Passed 0.00 sec
Start 3: test_util_integerfactoring
3/11 Test #3: test_util_integerfactoring ....... Passed 0.48 sec
Start 4: test_reduction
4/11 Test #4: test_reduction ................... Passed 6.95 sec
Start 5: test_array
5/11 Test #5: test_array ....................... Passed 3.46 sec
Start 6: test_blas
6/11 Test #6: test_blas ........................***Failed 4.13 sec
Start 7: test_elemwise
7/11 Test #7: test_elemwise ....................***Failed 23.39 sec
Start 8: test_error
8/11 Test #8: test_error ....................... Passed 0.00 sec
Start 9: test_buffer
9/11 Test #9: test_buffer ...................... Passed 4.20 sec
Start 10: test_buffer_collectives
10/11 Test #10: test_buffer_collectives ..........***Failed 1.02 sec
Start 11: test_collectives
11/11 Test #11: test_collectives .................***Failed 1.05 sec
64% tests passed, 4 tests failed out of 11
Total Test time (real) = 44.69 sec
The following tests FAILED:
6 - test_blas (Failed)
7 - test_elemwise (Failed)
10 - test_buffer_collectives (Failed)
11 - test_collectives (Failed)
Errors while running CTest
Makefile:127: recipe for target 'test' failed
make: *** [test] Error 8
My .theonorc file looks like this:
[global]
floatX = float32
device = cuda0
mode = FAST_RUN
[blas]
ldflags = -lopenblas -lgfortran
When running the Python tests, a lot of them fail. The log is attached as a text file.
Seems to boil down to a problem with blas and some cuda kernels not compiling, most
of the time failing on cuLinkAddData: CUDA_ERROR_UNKNOWN
Ok I recompiled, the C tests give the same output. The python tests give the following log with more output:
log.txt
Re-installed cuda 9, libnccl 2 and recompiled everything, all tests pass now.
1 thing that bugged me though: You have to leave the libgpuarray directory to run the tests. This got me going for a while.
In this section:
To run the python tests, install pygpu, then move outside its directory and run this command
I'd suggest making it bold that you have to move outside the libgpuarray directory.
Also this is very subtle, but in the compilation instruction, there's a stealthed "cd" command:
python setup.py build
python setup.py install --user
cd
DEVICE="<test device>" python -c "import pygpu;pygpu.test()"
Ad added some bold in a new PR.