train.py -gpu 0 2>&1 | tee ./log.txt ERROR

Question

train.py -gpu 0 2>&1 | tee ./log.txt ERROR

Crow77 opened this issue 5 years ago · 8 comments

I get this error when I run the train.py script.
Can anyone help??

data_augmentation_layer.cu:551] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***
@ 0x7f74eda1d5cd google::LogMessage::Fail()
@ 0x7f74eda1f433 google::LogMessage::SendToLog()
@ 0x7f74eda1d15b google::LogMessage::Flush()
@ 0x7f74eda1fe1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f74ee331fbf caffe::DataAugmentationLayer<>::Forward_gpu()
@ 0x7f74ee28e052 caffe::Net<>::ForwardFromTo()
@ 0x7f74ee28e177 caffe::Net<>::Forward()
@ 0x7f74ee0e0792 caffe::Solver<>::Test()
@ 0x7f74ee0e11ae caffe::Solver<>::TestAll()
@ 0x7f74ee0e12d2 caffe::Solver<>::Step()
@ 0x7f74ee0e1e59 caffe::Solver<>::Solve()
@ 0x40b497 train()
@ 0x4075a8 main
@ 0x7f74ec4d1830 __libc_start_main
@ 0x407d19 _start
@ (nil) (unknown)

Answer 1 · 2020-04-13T03:20:24.000Z

What is your CUDA version?

Answer 2 · 2020-04-13T07:54:37.000Z

CUDA version is 8.0 Mi Graphic Card is Nvidia GeForce GTX 950M Please help

…

On Mon, Apr 13, 2020, 04:20 Tak-Wai Hui ***@***.***> wrote: What is your CUDA version? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#63 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWMREIV7YZ5WN7KIJYUNE3RMKAIJANCNFSM4MGF3Z3A> .

Answer 3 · 2020-04-13T08:14:08.000Z

I am using the graphic card is RTX 2070 with cuda 10.0. I think it require Cuda 10.0 or more. Good luck Vào Th 2, 13 thg 4, 2020 lúc 15:54 Crow77 <notifications@github.com> đã viết:

…

Cuda version is 8.0 Please help On Mon, Apr 13, 2020, 04:20 Tak-Wai Hui ***@***.***> wrote: > What is your CUDA version? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#63 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ABWMREIV7YZ5WN7KIJYUNE3RMKAIJANCNFSM4MGF3Z3A > > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#63 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEIJ37PJYMPXPJB6IGVW2GDRMLAMZANCNFSM4MGF3Z3A> .

Answer 4 · 2020-04-13T08:27:22.000Z

So error it's because of CUDA version?

Here https://github.com/twhui/LiteFlowNet says
Installation was tested under Ubuntu 14.04.5/16.04.2 with CUDA 8.0, cuDNN 5.1 and openCV 2.4.8/3.1.0.

That's why I installed this CUDA version to avoid certain errors

Answer 5 · 2020-04-13T08:49:10.000Z

Cuda 8.0 not compatible with the new graphic card. In my configurations are run good conditions such as RTX2070, Ubuntu 18.04, Cuda 10.0, CuDNN 5.1 and Open CV 3.4.4. Vào Th 2, 13 thg 4, 2020 lúc 16:27 Crow77 <notifications@github.com> đã viết:

…

So error it's because of CUDA version? Here https://github.com/twhui/LiteFlowNet says Installation was tested under Ubuntu 14.04.5/16.04.2 with CUDA 8.0, cuDNN 5.1 and openCV 2.4.8/3.1.0. That's why I installed this CUDA version to avoid certain errors — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#63 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEIJ37J52VPQ2BKK5JJIEGDRMLEHPANCNFSM4MGF3Z3A> .

Answer 6 · 2020-05-03T18:37:35.000Z

Now it throws "out of memory" error. Any parameter to adjust to avoid this?

syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f05833465cd google::LogMessage::Fail()
@ 0x7f0583348433 google::LogMessage::SendToLog()
@ 0x7f058334615b google::LogMessage::Flush()
@ 0x7f0583348e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f05839b3670 caffe::SyncedMemory::to_gpu()
@ 0x7f05839b2599 caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f0583bfe3c2 caffe::Blob<>::mutable_gpu_data()
@ 0x7f0583c5d1ed caffe::ConcatLayer<>::Forward_gpu()
@ 0x7f0583b8a142 caffe::Net<>::ForwardFromTo()
@ 0x7f0583b8a267 caffe::Net<>::Forward()
@ 0x7f0583bf4302 caffe::Solver<>::Test()
@ 0x7f0583bf4d1e caffe::Solver<>::TestAll()
@ 0x7f0583bf4e42 caffe::Solver<>::Step()
@ 0x7f0583bf59c9 caffe::Solver<>::Solve()
@ 0x40b497 train()
@ 0x4075a8 main
@ 0x7f0581dfa830 __libc_start_main
@ 0x407d19 _start
@ (nil) (unknown)

Answer 7 · 2020-05-04T09:16:01.000Z

The best solution is to use a better GPU (or run the script in multi-gpu mode if you have 2 or more GPUs). Otherwise, you can reduce the batch size and increase the number of iterations accordingly in solver.prototxt.

Answer 8 · 2020-05-08T02:05:58.000Z

I'll try that...
Thanks