twhui/LiteFlowNet

train.py -gpu 0 2>&1 | tee ./log.txt ERROR

Crow77 opened this issue · 8 comments

I get this error when I run the train.py script.
Can anyone help??

data_augmentation_layer.cu:551] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***
@ 0x7f74eda1d5cd google::LogMessage::Fail()
@ 0x7f74eda1f433 google::LogMessage::SendToLog()
@ 0x7f74eda1d15b google::LogMessage::Flush()
@ 0x7f74eda1fe1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f74ee331fbf caffe::DataAugmentationLayer<>::Forward_gpu()
@ 0x7f74ee28e052 caffe::Net<>::ForwardFromTo()
@ 0x7f74ee28e177 caffe::Net<>::Forward()
@ 0x7f74ee0e0792 caffe::Solver<>::Test()
@ 0x7f74ee0e11ae caffe::Solver<>::TestAll()
@ 0x7f74ee0e12d2 caffe::Solver<>::Step()
@ 0x7f74ee0e1e59 caffe::Solver<>::Solve()
@ 0x40b497 train()
@ 0x4075a8 main
@ 0x7f74ec4d1830 __libc_start_main
@ 0x407d19 _start
@ (nil) (unknown)

twhui commented

What is your CUDA version?

So error it's because of CUDA version?

Here https://github.com/twhui/LiteFlowNet says
Installation was tested under Ubuntu 14.04.5/16.04.2 with CUDA 8.0, cuDNN 5.1 and openCV 2.4.8/3.1.0.

That's why I installed this CUDA version to avoid certain errors

Now it throws "out of memory" error. Any parameter to adjust to avoid this?

syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f05833465cd google::LogMessage::Fail()
@ 0x7f0583348433 google::LogMessage::SendToLog()
@ 0x7f058334615b google::LogMessage::Flush()
@ 0x7f0583348e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f05839b3670 caffe::SyncedMemory::to_gpu()
@ 0x7f05839b2599 caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f0583bfe3c2 caffe::Blob<>::mutable_gpu_data()
@ 0x7f0583c5d1ed caffe::ConcatLayer<>::Forward_gpu()
@ 0x7f0583b8a142 caffe::Net<>::ForwardFromTo()
@ 0x7f0583b8a267 caffe::Net<>::Forward()
@ 0x7f0583bf4302 caffe::Solver<>::Test()
@ 0x7f0583bf4d1e caffe::Solver<>::TestAll()
@ 0x7f0583bf4e42 caffe::Solver<>::Step()
@ 0x7f0583bf59c9 caffe::Solver<>::Solve()
@ 0x40b497 train()
@ 0x4075a8 main
@ 0x7f0581dfa830 __libc_start_main
@ 0x407d19 _start
@ (nil) (unknown)

twhui commented

The best solution is to use a better GPU (or run the script in multi-gpu mode if you have 2 or more GPUs). Otherwise, you can reduce the batch size and increase the number of iterations accordingly in solver.prototxt.

I'll try that...
Thanks