uoguelph-mlrg/Theano-MPI

Segmentation fault for all examples with python3

Nqabz opened this issue · 6 comments

Nqabz commented

I tried testing all your examples but keep on running into segmentation faults when using 1,2,3,4,.., 8 GPUs. Is there any resolve to this? I see you had an earlier bug related to segmentation faults. Here is my fault trace:


Theano-MPI started 2 workers for 
 1.updating Cifar10_model params through iterations and
 2.exchange the params with EASGD
See output log.
cluster3.31164hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
cluster3.31163hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
cluster3.31165hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
Using cuDNN version 5005 on context None
Mapped name None to device cuda2: Tesla K80 (0000:08:00.0)
INFO (theano.gof.compilelock): Waiting for existing lock by process '31163' (I am process '31164')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.3.1611-Core-x86_64-3.4.2-64/lock_dir
Using cuDNN version 5005 on context None
Mapped name None to device cuda0: Tesla K80 (0000:04:00.0)
Using Theano backend.
input shape is: (3, 32, 32, 256)
subtract shape is: (3, 32, 32, 1)
center margin is: 0
crop size is: 32
flag_on is: <GpuArrayType<None>(float32, ())>
[cluster3:31163]  Process received signal 
[cluster3:31163] Signal: Segmentation fault (11)
[cluster3:31163] Signal code:  (128)
[cluster3:31163] Failing at address: (nil)
[cluster3:31163] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x7fd4390c8370]
[cluster3:31163] [ 1] /usr/local/lib/libgpuarray.so.2(gpukernel_release+0xa)[0x7fd40c8c898a]
[cluster3:31163] [ 2] /usr/local/lib/libgpuarray.so.2(GpuKernel_clear+0x11)[0x7fd40c8d2131]
[cluster3:31163] [ 3] /usr/local/lib/libgpuarray.so.2(GpuKernel_init+0xb0)[0x7fd40c8d2200]
[cluster3:31163] [ 4] /home/dlq/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.3.1611-Core-x86_64-3.4.2-64/tmp7kzja1uh/m7a35c53365410a3b80c5389af5d2afa5.so(+0x190a)[0x7fd3e6bde90a]
[cluster3:31163] [ 5] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalFrameEx+0x5bd3)[0x7fd43941c253]
[cluster3:31163] [ 6] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalCodeEx+0x882)[0x7fd43941f0f2]
[cluster3:31163] [20] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalCodeEx+0x882)[0x7fd43941f0f2]
[cluster3:31163] [21] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalFrameEx+0x5d13)[0x7fd43941c393]
[cluster3:31163] [22] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalCodeEx+0x882)[0x7fd43941f0f2]
[cluster3:31163] [23] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalFrameEx+0x5d13)[0x7fd43941c393]
[cluster3:31163] [24] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalCodeEx+0x882)[0x7fd43941f0f2]
[cluster3:31163] [25] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalFrameEx+0x5d13)[0x7fd43941c393]
[cluster3:31163] [26] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalCodeEx+0x882)[0x7fd43941f0f2]
[cluster3:31163] [27] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalFrameEx+0x5d13)[0x7fd43941c393]
[cluster3:31163] [28] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalCodeEx+0x882)[0x7fd43941f0f2]
[cluster3:31163] [29] /usr/local/lib/libpython3.4m.so.1.0(PyEval_EvalFrameEx+0x5d13)[0x7fd43941c393]
[gist-smapper3:31163]  End of error message 
Using cuDNN version 5005 on context None
Mapped name None to device cuda1: Tesla K80 (0000:05:00.0)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 31163 on node gist-smapper3 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

 Rule session 31159 terminated with return code: 139. ```
hma02 commented

@Nqabz

Thanks for reporting. Let me try debugging it and get back to you.

hma02 commented

@Nqabz

It seems the Python3 support for Theano-MPI is still experimental but python2 should work since it was developed based on python2. I tried installing anaconda python3.6 and tested it using the test_model.py script:

$ cd theanompi/models/
$ python3 test_model.py cifar10 Cifar10_model
Using cuDNN version 5110 on context None
Mapped name None to device cuda0: GeForce GTX TITAN Black (0000:03:00.0)
rank0: bad list is [], extended to 156
rank0: bad list is [], extended to 39
Cifar10_model
Layer Subtract	 	 in (3, 32, 32, 256) --> out (3, 32, 32, 256)
Layer Crop	 	 in [  3  32  32 256] --> out (3, 28, 28, 256)
Layer Dimshuffle     	 in [  3  28  28 256] --> out (256, 3, 28, 28)
Layer Conv (cudnn) 	 in [256   3  28  28] --> out (256, 64, 24, 24)
Layer Pool	 	 in [256  64  24  24] --> out (256, 64, 12, 12)
Layer Conv (cudnn) 	 in [256  64  12  12] --> out (256, 128, 8, 8)
Layer Pool	 	 in [256 128   8   8] --> out (256, 128, 4, 4)
Layer Conv (cudnn) 	 in [256 128   4   4] --> out (256, 64, 2, 2)
Layer Flatten	 	 in [256  64   2   2] --> out (256, 256)
Layer FC	 	 in [256 256] --> out (256, 256)
Layer Dropout0.5 	 in [256 256] --> out (256, 256)
Layer Softmax	 	 in [256 256] --> out (256, 10)
[64  3  5  5]
[64]
[128  64   5   5]
[128]
[ 64 128   3   3]
[64]
[256 256]
[256]
[256  10]
[10]
model size 0.336 M floats
compiling training function...
[GPU8:155335] *** Process received signal ***
[GPU8:155335] Signal: Segmentation fault (11)
[GPU8:155335] Signal code:  (128)
[GPU8:155335] Failing at address: (nil)
[GPU8:155335] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f9e7a4bd330]
[GPU8:155335] [ 1] /export/mlrg/hma02/anaconda2-x86_64/lib/python2.7/site-packages/pygpu/../../../libgpuarray.so.2(gpukernel_release+0xa)[0x7f9e555b554a]
[GPU8:155335] [ 2] /export/mlrg/hma02/anaconda2-x86_64/lib/python2.7/site-packages/pygpu/../../../libgpuarray.so.2(GpuKernel_clear+0x11)[0x7f9e555bf571]
[GPU8:155335] [ 3] /export/mlrg/hma02/anaconda2-x86_64/lib/python2.7/site-packages/pygpu/../../../libgpuarray.so.2(GpuKernel_init+0xe8)[0x7f9e555bf678]
[GPU8:155335] [ 4] /export/mlrg/hma02/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.13-64/tmpGRA547/52e419a14bf3ba72bf7b2d47176d6a81.so(+0x17da)[0x7f9e30d0c7da]
[GPU8:155335] [ 5] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x84fd)[0x7f9e7a7c8bad]
[GPU8:155335] [ 6] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [ 7] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [ 8] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [ 9] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [10] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [11] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [12] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [13] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [14] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [15] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [16] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [17] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [18] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [19] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [20] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [21] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [22] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [23] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [24] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [25] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47)[0x7f9e7a7c91f7]
[GPU8:155335] [26] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f9e7a7c9c3e]
[GPU8:155335] [27] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(+0x79b68)[0x7f9e7a744b68]
[GPU8:155335] [28] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53)[0x7f9e7a714e93]
[GPU8:155335] [29] /export/mlrg/hma02/anaconda2-x86_64/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x61d6)[0x7f9e7a7c6886]
[GPU8:155335] *** End of error message ***
Segmentation fault (core dumped)
 $ 

I got segfault when compiling training function as well.

I tried on both ppc64le and x86_64 systems following the steps below:

  1. I have a working openmpi (version 1.8.8) and then installed anaconda python3.
  2. git clone mpi4py and build it against my openmpi (you have to check which mpirun and set the corresponding path in the openmpi section of mpi.cfg, then run python3 setup.py build --mpi=openmpi, then pip3 install -U .)
  3. conda install pygpu
  4. conda install theano
  5. git clone hickle, cd hickle, git checkout dev and pip3 install -U .
  6. git clone theanompi and pip3 install -U .

The test passes with python3 on ppc64le but not on x86_64. The libpython3.6 on x86_64 seems having problems with libgpuarray from the traceback, and the test passes with python2 on both architectures.

If you need to try out ideas with Theano-MPI, I recommend starting with python2 before the python3 support is complete.

Nqabz commented

Thanks for the checking. I will try on Python2.7

hma02 commented

@Nqabz

I just tried upgrading my theano to the bleeding-edge version. You just need to change the step 4 in the upper mentioned steps to

  1. git clone theano, cd theano and pip3 install -U .

I tested the bsp example and it's working now with python3:

$ python3 test_bsp.py 
Theano-MPI started 2 workers for 
 1.updating Cifar10_model params through iterations and
 2.exchange the params with BSP(cdd,nccl32)
See output log.
Using cuDNN version 5110 on context None
Mapped name None to device cuda0: GeForce GTX TITAN Black (0000:83:00.0)
Using cuDNN version 5110 on context None
Mapped name None to device cuda1: GeForce GTX TITAN (0000:04:00.0)
rank0: bad list is [], extended to 156
rank0: bad list is [38], extended to 40
Cifar10_model
Layer Subtract	 	 in (3, 32, 32, 256) --> out (3, 32, 32, 256)
Layer Crop	 	 in [  3  32  32 256] --> out (3, 28, 28, 256)
Layer Dimshuffle     	 in [  3  28  28 256] --> out (256, 3, 28, 28)
Layer Conv (cudnn) 	 in [256   3  28  28] --> out (256, 64, 24, 24)
Layer Pool	 	 in [256  64  24  24] --> out (256, 64, 12, 12)
Layer Conv (cudnn) 	 in [256  64  12  12] --> out (256, 128, 8, 8)
Layer Pool	 	 in [256 128   8   8] --> out (256, 128, 4, 4)
Layer Conv (cudnn) 	 in [256 128   4   4] --> out (256, 64, 2, 2)
Layer Flatten	 	 in [256  64   2   2] --> out (256, 256)
Layer FC	 	 in [256 256] --> out (256, 256)
Layer Dropout0.5 	 in [256 256] --> out (256, 256)
Layer Softmax	 	 in [256 256] --> out (256, 10)
[64  3  5  5]
[64]
[128  64   5   5]
[128]
[ 64 128   3   3]
[64]
[256 256]
[256]
[256  10]
[10]
model size 0.336 M floats
compiling training function...
INFO (theano.gof.compilelock): Waiting for existing lock by process '188888' (I am process '188887')
INFO (theano.gof.compilelock): To manually release the lock, delete /export/mlrg/hma02/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.6.1-64/lock_dir
compiling validation function...
Compile time: 39.133 s

40 2.206262 0.819336
time per 40 batches: 0.85 (train 0.38 comm 0.39 wait 0.08)
Nqabz commented

@hma02 : this is very helpful. I will try it tomorrow. Did you test this on an x86_64 architecture?

hma02 commented

@Nqabz
Yes. x86_64