marcellacornia/sam

running on GPU

Closed this issue · 6 comments

Hi,
Thank you for sharing the code! I successfully run it on the sample images with CPU. But when I tried to configure theano to use device=gpu or device=cuda, they both failed with quite long error messages.
For cuda particularly, there was a warning message at the beginning
/home/mingbo/anaconda3/envs/sam/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:631: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.1.

My first question is, do you expect your code to run faster on GPU? If not then I will just stay with CPU.
In case the answer is yes, do you think the version of theano 0.9.0 can ever work with cuDNN 6?
I am using cuda-8.0 and cuDNN 6 now.
Thanks a lot!

Ah, I just upgraded theano to 0.10.0b1 and it worked!

My first question is, do you expect your code to run faster on GPU?

What kind of speed improvement did you get?

I am trying to speed up testing this on the gpu. So far the code is unable to use the gpu.
@lcnature I do not see any errors, I manually give cuda3 as flag while running the code.
Is there something specific one needs to do to make sure the code uses the gpu ?

Did you see any speed improvement yourself?
@marcellacornia Thanks for the code, could you please suggest if you are able to run this code (testing) on gpu and how long does it take according to your experience to test a single image on GPU vs CPU? Thanks for your inputs in advance.

@kkhetarpal
I am sorry that it was long time ago so I already forget what I did.
It might have something to do with the right versions and settings of theano and keras.
I am not sure if the following info are critical for it to be able to run on gpu, but jut for your reference:

Here is my ~/.keras/keras.json:
{
"floatx": "float32",
"epsilon": 1e-07,
"backend": "theano",
"image_data_format": "channels_last",
"image_dim_ordering": "th"
}

And in ~/.theano, I had these lines:
[global]
floatX = float32
device = cuda
[nvcc]
fastmath = True
[gcc]
cxxflags = -ID:\MinGW\include
(and you want to check the root variable points correctly to your cuda driver folder.)

And for some reason, I modified the importing in main.py:
import os, cv2, sys was changed to

import os, sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import cv2

And these are the versions of packages in my conda environment (I see some redundancy maybe due to my fiddling around, so you probably do not have to install all of them):

alabaster                 0.7.10                   py35_0  
appdirs                   1.4.3                     <pip>
babel                     2.5.0                    py35_0  
certifi                   2016.2.28                py35_0  
decorator                 4.1.2                     <pip>
docutils                  0.14                     py35_0  
h5py                      2.7.0               np111py35_0  
hdf5                      1.8.17                        2  
imagesize                 0.7.1                    py35_0  
jinja2                    2.9.6                    py35_0  
Keras                     1.1.0                     <pip>
libgfortran               3.0.0                         1  
libgpuarray               0.6.9                         0  
Mako                      1.0.7                     <pip>
mako                      1.0.6                    py35_0  
markupsafe                1.0                      py35_0  
mkl                       2017.0.3                      0  
mkl-service               1.1.2                    py35_3  
nose                      1.3.7                    py35_1  
numpy                     1.11.3                   py35_0  
opencv-python             3.3.0.10                  <pip>
openssl                   1.0.2l                        0  
pip                       9.0.1                    py35_1  
py                        1.4.34                    <pip>
pycuda                    2017.1.1                  <pip>
pygments                  2.2.0                    py35_0  
pygpu                     0.6.9                    py35_0  
pytest                    3.2.3                     <pip>
python                    3.5.4                         0  
pytools                   2017.6                    <pip>
pytz                      2017.2                   py35_0  
PyYAML                    3.12                      <pip>
readline                  6.2                           2  
requests                  2.14.2                   py35_0  
scikit-cuda               0.5.1                     <pip>
scipy                     0.19.0              np111py35_0  
setuptools                36.4.0                   py35_1  
six                       1.10.0                   py35_0  
six                       1.11.0                    <pip>
snowballstemmer           1.2.1                    py35_0  
sphinx                    1.6.3                    py35_0  
sphinxcontrib             1.0                      py35_0  
sphinxcontrib-websupport  1.0.1                    py35_0  
sqlite                    3.13.0                        0  
Theano                    0.10.0b1                  <pip>
tk                        8.5.18                        0  
wheel                     0.29.0                   py35_0  
xz                        5.2.3                         0  
zlib                      1.2.11                        0  

Finally, if you get it running on GPU, you can set b_s in config.py to as large as your gpu can handle (I just tried 64) from the default setting of 1.

Thanks a lot @marcellacornia for your inputs.
I was able to run on GPU. Just FYI for anyone interested: Each image alone (without a batch) takes 4 sec for the testing.

@kkhetarpal What type of GPU did you try it on?