PyGPU tests fail with cuLinkAddData: CUDA_ERROR_UNKNOWN

Question

PyGPU tests fail with cuLinkAddData: CUDA_ERROR_UNKNOWN

Closed this issue 7 years ago · 7 comments

Hello,

I'm on Ubuntu 16.04, and have a Geforce GTX 1080 with Cuda 8.0 runtime installed.
I also installed Cudnn 5.1.
My cuda libraries (and cudnn) are all found by the linker path.

/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          9.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8114 MBytes (8508145664 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
Result = PASS

I compiled manually libgpuarray using the instructions:

mkdir Build
cd Build
# you can pass -DCMAKE_INSTALL_PREFIX=/path/to/somewhere to install to an alternate location
cmake .. -DCMAKE_BUILD_TYPE=Release # or Debug if you are investigating a crash
make
make install
cd ..
# This must be done after libgpuarray is installed as per instructions above.
python setup.py build
python setup.py install --user

Running the C unit tests fails on the following tests:


DEVICE=cuda0 make test
Running tests...
Test project /home/user/development/libgpuarray/build
      Start  1: test_types
 1/11 Test  #1: test_types .......................   Passed    0.00 sec
      Start  2: test_util
 2/11 Test  #2: test_util ........................   Passed    0.00 sec
      Start  3: test_util_integerfactoring
 3/11 Test  #3: test_util_integerfactoring .......   Passed    0.48 sec
      Start  4: test_reduction
 4/11 Test  #4: test_reduction ...................   Passed    6.95 sec
      Start  5: test_array
 5/11 Test  #5: test_array .......................   Passed    3.46 sec
      Start  6: test_blas
 6/11 Test  #6: test_blas ........................***Failed    4.13 sec
      Start  7: test_elemwise
 7/11 Test  #7: test_elemwise ....................***Failed   23.39 sec
      Start  8: test_error
 8/11 Test  #8: test_error .......................   Passed    0.00 sec
      Start  9: test_buffer
 9/11 Test  #9: test_buffer ......................   Passed    4.20 sec
      Start 10: test_buffer_collectives
10/11 Test #10: test_buffer_collectives ..........***Failed    1.02 sec
      Start 11: test_collectives
11/11 Test #11: test_collectives .................***Failed    1.05 sec

64% tests passed, 4 tests failed out of 11

Total Test time (real) =  44.69 sec

The following tests FAILED:
	  6 - test_blas (Failed)
	  7 - test_elemwise (Failed)
	 10 - test_buffer_collectives (Failed)
	 11 - test_collectives (Failed)
Errors while running CTest
Makefile:127: recipe for target 'test' failed
make: *** [test] Error 8

My .theonorc file looks like this:

[global]
floatX = float32
device = cuda0
mode = FAST_RUN

[blas]
ldflags = -lopenblas -lgfortran

When running the Python tests, a lot of them fail. The log is attached as a text file.
Seems to boil down to a problem with blas and some cuda kernels not compiling, most
of the time failing on cuLinkAddData: CUDA_ERROR_UNKNOWN

log.txt

Answer 1 · 2017-10-25T13:03:12.000Z

Can you recompile by redoing all the installation command by changing: cmake .. -DCMAKE_BUILD_TYPE=Release # or Debug if you are investigating a crash to cmake .. -DCMAKE_BUILD_TYPE=Debug to get more information?

…

On Wed, Oct 25, 2017 at 8:15 AM Alexandre Gauthier ***@***.***> wrote: Hello, I'm on Ubuntu 16.04, and hava a Geforce GTX 1080 with Cuda 8.0 runtime installed. I also installed Cudnn 5.1. My cuda libraries (and cudnn) are all found by the linker path. /deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 1080" CUDA Driver Version / Runtime Version 9.0 / 8.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 8114 MBytes (8508145664 bytes) (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores GPU Max Clock rate: 1734 MHz (1.73 GHz) Memory Clock rate: 5005 Mhz Memory Bus Width: 256-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080 Result = PASS I compiled manually libgpuarray using the instructions: mkdir Build cd Build # you can pass -DCMAKE_INSTALL_PREFIX=/path/to/somewhere to install to an alternate location cmake .. -DCMAKE_BUILD_TYPE=Release # or Debug if you are investigating a crash make make install cd .. # This must be done after libgpuarray is installed as per instructions above. python setup.py build python setup.py install --user Running the C unit tests fails on the following tests: DEVICE=cuda0 make test Running tests... Test project /home/user/development/libgpuarray/build Start 1: test_types 1/11 Test #1: test_types ....................... Passed 0.00 sec Start 2: test_util 2/11 Test #2: test_util ........................ Passed 0.00 sec Start 3: test_util_integerfactoring 3/11 Test #3: test_util_integerfactoring ....... Passed 0.48 sec Start 4: test_reduction 4/11 Test #4: test_reduction ................... Passed 6.95 sec Start 5: test_array 5/11 Test #5: test_array ....................... Passed 3.46 sec Start 6: test_blas 6/11 Test #6: test_blas ........................***Failed 4.13 sec Start 7: test_elemwise 7/11 Test #7: test_elemwise ....................***Failed 23.39 sec Start 8: test_error 8/11 Test #8: test_error ....................... Passed 0.00 sec Start 9: test_buffer 9/11 Test #9: test_buffer ...................... Passed 4.20 sec Start 10: test_buffer_collectives 10/11 Test #10: test_buffer_collectives ..........***Failed 1.02 sec Start 11: test_collectives 11/11 Test #11: test_collectives .................***Failed 1.05 sec 64% tests passed, 4 tests failed out of 11 Total Test time (real) = 44.69 sec The following tests FAILED: 6 - test_blas (Failed) 7 - test_elemwise (Failed) 10 - test_buffer_collectives (Failed) 11 - test_collectives (Failed) Errors while running CTest Makefile:127: recipe for target 'test' failed make: *** [test] Error 8 My .theonorc file looks like this: [global] floatX = float32 device = cuda0 mode = FAST_RUN [blas] ldflags = -lopenblas -lgfortran When running the Python tests, a lot of them fail. The log is attached as a text file. Seems to boil down to a problem with blas and some cuda kernels not compiling, most of the time failing on cuLinkAddData: CUDA_ERROR_UNKNOWN log.txt <https://github.com/Theano/libgpuarray/files/1414493/log.txt> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#552>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALC-9bsYLm2fbCls5bMYnEamv1y4_Uzks5svyZugaJpZM4QF5t8> .

Answer 2 · 2017-10-25T13:28:35.000Z

Ok I recompiled, the C tests give the same output. The python tests give the following log with more output:
log.txt

Answer 3 · 2017-10-27T13:29:31.000Z

Re-installed cuda 9, libnccl 2 and recompiled everything, all tests pass now.
1 thing that bugged me though: You have to leave the libgpuarray directory to run the tests. This got me going for a while.

Answer 4 · 2017-10-27T14:13:29.000Z

Where do you think we could document this better? It is at the place I go to copy/paste the compilation, but you probably checkd a different part of the doc.

…

On Fri, Oct 27, 2017 at 9:29 AM Alexandre Gauthier ***@***.***> wrote: Closed #552 <#552>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#552 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALC-z2ICYgcW2xNm_0wvH_IeAxKlXLhks5swdq-gaJpZM4QF5t8> .

Answer 5 · 2017-10-27T14:18:04.000Z

In this section:
To run the python tests, install pygpu, then move outside its directory and run this command

I'd suggest making it bold that you have to move outside the libgpuarray directory.

Also this is very subtle, but in the compilation instruction, there's a stealthed "cd" command:

python setup.py build
python setup.py install --user
cd
DEVICE="<test device>" python -c "import pygpu;pygpu.test()"

Answer 6 · 2017-10-27T20:33:52.000Z

It seem a good change. Do you want to make a PR? We can do it if you prefer. thanks

…

On Fri, Oct 27, 2017 at 10:18 AM Alexandre Gauthier < ***@***.***> wrote: In this <http://deeplearning.net/software/libgpuarray/installation.html#running-tests> section: To run the python tests, install pygpu, then move outside its directory and run this command I'd suggest making it bold that you have to move outside the libgpuarray directory. Also this is very subtle, but in the compilation instruction <http://deeplearning.net/software/libgpuarray/installation.html#step-by-step-install-user-library>, there's a stealthed "cd" command: python setup.py build python setup.py install --user cd DEVICE="<test device>" python -c "import pygpu;pygpu.test()" — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#552 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALC-4aeA7nwvvmM2ANC1-832pXM2JMsks5sweYcgaJpZM4QF5t8> .

Answer 7 · 2018-01-10T17:00:51.000Z

Ad added some bold in a new PR.