Error while testing library
Closed this issue · 6 comments
Hi
After installing kblas on my arch 62 via cuda 10.2 and running make in testing, I tried running "./test_dtrmm -N 200:512" in /testing/bin which gave me the following error:
side L, uplo L, trans N, diag N, db 512
M N kblasTRMM_REC GF/s (ms) kblasTRMM_CU GF/s (ms) cublasTRMM GF/s (ms) SP_REC SP_CU Error
====================================================================
200 512 CUDA runtime error: no kernel image is available for execution on the device (209) in Xtrmm at blas_l3/Xtrmm.cu:479
CUBLAS error: execution failed (13) in test_trmm at blas_l3/test_trmm.ch:202
Am I doing something wrong? I wish to use kblas for doing batched svd. How do I use this library? I did get some warnings while making kblas, could that be the reason for this error?
Hi
This usually happens when the Gencode parameters of CUDA kernels doesn't match with your GPU arch.
I attached a video that helps you go through the installation of kblas-gpu from scratch using Pascal architecture.
You can follow the steps and reach us if you still have such errors.
https://www.youtube.com/watch?v=jAWdo39M-xk
and a google doc store the required files location.
https://docs.google.com/document/d/1UF-53VoZOz8uBhdC8ob9uwW6NmuqnAJwGqWuu6Rn584/edit?usp=sharing
Hope it helps.
To use batched-svd, I think you can check ./testing/batch_triangular/test_Xsvd_full_batch.cpp
Thank you very much @hongyx11. I followed the tutorial and was able to get the library running. I tried running test_dtrmm and it ran successfully. However, I am getting memory errors when I try to run test_dsvd_full_batch even for very small sizes of matrices. Do you know why this might be happening?
./test_dsvd_full_batch -N 100:512
batchCount M N kblasSVD GF/s (ms) Error
4 100 512 1542792336 !!!! malloc_cpu failed for: h_Au
./test_dsvd_full_batch -N 10:10
batchCount M N kblasSVD GF/s (ms) Error
4 10 10 714802320 !!!! malloc_cpu failed for: h_Au
./test_ssvd_full_batch -N 200:200
batchCount M N kblasSVD GF/s (ms) Error
4 200 200 -1657445232 CUDA runtime error: out of memory (2) in test_Xsvd_full_batch at batch_triangular/test_Xsvd_full_batch.cpp:227
CUDA runtime error: out of memory (2) in test_Xsvd_full_batch at batch_triangular/test_Xsvd_full_batch.cpp:228
CUDA runtime error: out of memory (2) in Xset_pointer_4_core at batch_triangular/Xhelper_funcs.cuh:335
gpuKblasAssert: CUDA error batch_triangular/test_Xsvd_full_batch.cpp 257
./test_ssvd_full_batch -N 20:20
batchCount M N kblasSVD GF/s (ms) Error
4 20 20 1326126224 !!!! malloc_cpu failed for: h_Au
./test_ssvd_full_batch -N 2:2
batchCount M N kblasSVD GF/s (ms) Error
4 2 2 1879909520 !!!! malloc_cpu failed for: h_Au
./test_ssvd_full_batch -N 1:1
batchCount M N kblasSVD GF/s (ms) Error
4 1 1 1176986768 CUDA runtime error: out of memory (2) in test_Xsvd_full_batch at batch_triangular/test_Xsvd_full_batch.cpp:228
CUDA runtime error: out of memory (2) in Xset_pointer_4_core at batch_triangular/Xhelper_funcs.cuh:335
gpuKblasAssert: CUDA error batch_triangular/test_Xsvd_full_batch.cpp 257
Hi @rahulwankhede , what‘s size of your gpu memory? Could you do nvidia-smi?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... On | 00000000:01:00.0 On | N/A |
| 45% 23C P8 N/A / 75W | 605MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1191 G /usr/lib/xorg/Xorg 28MiB |
| 0 1344 G /usr/bin/gnome-shell 47MiB |
| 0 1597 G /usr/lib/xorg/Xorg 204MiB |
| 0 1750 G /usr/bin/gnome-shell 252MiB |
| 0 2323 G ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files 53MiB |
| 0 19664 G /opt/teamviewer/tv_bin/TeamViewer 12MiB |
+-----------------------------------------------------------------------------+
My GPU memory is 4 GB. If it might be relevant, my arch_sm is 61 (I mistakenly said 62 in my original post and think installing for 62 was one of the reasons I was not able to run kblas). I have done the installation again this time for the correct sm. I did it without using or loading a module since I already have CUDA 10.2 as my default installation. GPU model GTX 1050 Ti. GCC version 7.5.0
Hi @rahulwankhede ,
I don't have too much insight about the memory usage of KBLAS. But as you can see, there is cpu malloc error also. Maybe you need also look into this.
We test your input on a 8G memory Pascal GPU and it passed without error.
I also noticed that there are visulization job working on your job. Maybe another way is to kill processes on the GPU and try again. You are interested in the performance results of KBLAS right? These processes will lower the performance. If not, then you can use a few simple sequential svd instead.
Best,
Yuxi
Thanks @hongyx11. There seems to be something wrong with my installation. I'll try installing on a different GPU with more memory and see if it works. Thanks a lot for the help. Cheers!