dmlc/mshadow

Compiling NNET examle

Closed this issue · 2 comments

Hi,

When I try to compile neural net example I get the following error:

nvcc -o nnet_ps -O3 --use_fast_math -ccbin g++  -Xcompiler "-Wall -O3 -I../../ -fopenmp -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -I/usr/include/cuda/ -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_DIST_PS=0" -Xlinker "-lm -lm -lcudart -lcublas -lcurand -L/usr/lib64 -lopenblas -L/usr/lib64/atlas" nnet_ps.cu
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::gpu>::SyncProc<1>(mshadow::Tensor<mshadow::gpu, 1, float>, mshadow::Tensor<mshadow::gpu, 1, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3gpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3gpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x108): undefined reference to `NNet<mshadow::gpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::gpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::gpu>::SyncProc<2>(mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::Tensor<mshadow::gpu, 2, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3gpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3gpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x165): undefined reference to `NNet<mshadow::gpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::gpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::cpu>::SyncProc<1>(mshadow::Tensor<mshadow::cpu, 1, float>, mshadow::Tensor<mshadow::cpu, 1, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3cpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3cpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i]+0xe1): undefined reference to `NNet<mshadow::cpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::cpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::cpu>::SyncProc<2>(mshadow::Tensor<mshadow::cpu, 2, float>, mshadow::Tensor<mshadow::cpu, 2, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3cpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3cpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x13d): undefined reference to `NNet<mshadow::cpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::cpu>*, void*)'
collect2: error: ld returned 1 exit status
Makefile:34: recipe for target 'nnet_ps' failed

Any ideas what can I do to fix it? Is this example old?

Szymon

ok adding:

template class NNet<cpu>;
template class NNet<gpu>;

Under the definition of NNet class in nnet_ps resolves this issue for me. I think my compiler is not very happy with instantiation of nested templated classes...

Also when I run the code on 4 cpus I barely get any speedup (only about 30% faster than single CPU) - is that expected here? I know that Hogwild code normally scales linearly, but this is not hogwild is it?

Thank you,
Szymon

Yes, I think it is normal. This was mainly because the synchronization cost and it is not pure hogwild. When you are running multiple GPUs, you could not freely write to a shared memory region. The demo is mainly for demonstration purpose of mshadow-ps

You will find great speedup for larger problems and real neuralnet that you work on in cxxnet