Compiling NNET examle
Closed this issue · 2 comments
Hi,
When I try to compile neural net example I get the following error:
nvcc -o nnet_ps -O3 --use_fast_math -ccbin g++ -Xcompiler "-Wall -O3 -I../../ -fopenmp -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -I/usr/include/cuda/ -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_DIST_PS=0" -Xlinker "-lm -lm -lcudart -lcublas -lcurand -L/usr/lib64 -lopenblas -L/usr/lib64/atlas" nnet_ps.cu
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::gpu>::SyncProc<1>(mshadow::Tensor<mshadow::gpu, 1, float>, mshadow::Tensor<mshadow::gpu, 1, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3gpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3gpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x108): undefined reference to `NNet<mshadow::gpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::gpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::gpu>::SyncProc<2>(mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::Tensor<mshadow::gpu, 2, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3gpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3gpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x165): undefined reference to `NNet<mshadow::gpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::gpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::cpu>::SyncProc<1>(mshadow::Tensor<mshadow::cpu, 1, float>, mshadow::Tensor<mshadow::cpu, 1, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3cpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3cpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i]+0xe1): undefined reference to `NNet<mshadow::cpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::cpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::cpu>::SyncProc<2>(mshadow::Tensor<mshadow::cpu, 2, float>, mshadow::Tensor<mshadow::cpu, 2, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3cpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3cpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x13d): undefined reference to `NNet<mshadow::cpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::cpu>*, void*)'
collect2: error: ld returned 1 exit status
Makefile:34: recipe for target 'nnet_ps' failed
Any ideas what can I do to fix it? Is this example old?
Szymon
ok adding:
template class NNet<cpu>;
template class NNet<gpu>;
Under the definition of NNet class in nnet_ps resolves this issue for me. I think my compiler is not very happy with instantiation of nested templated classes...
Also when I run the code on 4 cpus I barely get any speedup (only about 30% faster than single CPU) - is that expected here? I know that Hogwild code normally scales linearly, but this is not hogwild is it?
Thank you,
Szymon
Yes, I think it is normal. This was mainly because the synchronization cost and it is not pure hogwild. When you are running multiple GPUs, you could not freely write to a shared memory region. The demo is mainly for demonstration purpose of mshadow-ps
You will find great speedup for larger problems and real neuralnet that you work on in cxxnet