Error about "python ../train_rob.py 2>&1 | tee rob.log"

Question

Error about "python ../train_rob.py 2>&1 | tee rob.log"

Opened this issue 6 years ago · 9 comments

python ../train_rob.py 2>&1 | tee rob.log

F0610 14:24:18.777849 8200 io.cpp:36] Check failed: fd != -1 (-1 vs. -1) File not found: ../ROB_training/solver_rob_stage_one.prototxt
*** Check failure stack trace: ***
@ 0x7fc5b12285cd google::LogMessage::Fail()
@ 0x7fc5b122a433 google::LogMessage::SendToLog()
@ 0x7fc5b122815b google::LogMessage::Flush()
@ 0x7fc5b122ae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fc5b1bff8f8 caffe::ReadProtoFromTextFile()
@ 0x7fc5b1bcbaf6 caffe::ReadSolverParamsFromTextFileOrDie()
@ 0x40aaf6 train()
@ 0x407704 main
@ 0x7fc5b005f830 __libc_start_main
@ 0x407eb9 _start
@ (nil) (unknown)
('args:', [])
Executing /media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/iResNet/build/tools/caffe train -solver ../ROB_training/solver_rob_stage_one.prototxt -gpu 0

Hi, when I run "python ../train_rob.py 2>&1 | tee rob.log",I ran into the problem above , can you help me?Thank you very much!

Answer 1 · 2018-06-10T09:32:38.000Z

@zhangguanghui1 Hi, I have updated the code. Please following the instructions in README.md.I will keep on checking the code. If there is still any problem, I will fix the problems as soon as possible.

Answer 2 · 2018-06-10T12:07:36.000Z

@leonzfa Thank you for your reply! But I ran into a new problem.

......
layer {
name: "deconv1"
type: "Deconvolution"
bottom: "concat2"
top: "deconv1"
param {
lr_mult: 1
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
conv
I0610 20:05:49.922690 20066 layer_factory.hpp:77] Creating layer SceneFlow_train
I0610 20:05:49.922704 20066 net.cpp:91] Creating Layer SceneFlow_train
I0610 20:05:49.922710 20066 net.cpp:404] SceneFlow_train -> img0_sceneflow
I0610 20:05:49.922734 20066 net.cpp:404] SceneFlow_train -> img1_sceneflow
I0610 20:05:49.922741 20066 net.cpp:404] SceneFlow_train -> disp_sceneflow
F0610 20:05:49.922785 20066 custom_data_layer.cpp:361] Check failed: mdb_env_open(mdb_env_, this->layer_param_.data_param().source().c_str(), 0x20000|0x200000, 0664) == 0 (2 vs. 0) mdb_env_open failed
*** Check failure stack trace: ***
@ 0x7f832c16b5cd google::LogMessage::Fail()
@ 0x7f832c16d433 google::LogMessage::SendToLog()
@ 0x7f832c16b15b google::LogMessage::Flush()
@ 0x7f832c16de1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f832c985bc3 caffe::CustomDataLayer<>::LayerSetUp()
@ 0x7f832cad9f1f caffe::Net<>::Init()
@ 0x7f832cadba51 caffe::Net<>::Net()
@ 0x7f832c8540ea caffe::Solver<>::InitTrainNet()
@ 0x7f832c8546b7 caffe::Solver<>::Init()
@ 0x7f832c854a59 caffe::Solver<>::Solver()
@ 0x7f832cafb243 caffe::Creator_AdamSolver<>()
@ 0x40ad99 train()
@ 0x407704 main
@ 0x7f832afa2830 __libc_start_main
@ 0x407eb9 _start
@ (nil) (unknown)
('args:', [])
Executing /media/zgh/df7f0859-33fc-4a7e-afef-851b5c4f4005/zgh/iResNet_yuan/build/tools/caffe train -solver ./solver_rob_stage_one.prototxt -gpu 0

Thanks a lot!

Answer 3 · 2018-06-10T13:35:31.000Z

@zhangguanghui1 It seems to be the problem from lmdb. Have you downloaded the Scene Flow dataset? Anyway, if you have downloaded all the datasets, please use Terminal to run "sh ./make_lmdbs.sh".

Answer 4 · 2018-06-10T14:06:08.000Z

@leonzfa I must download all the datasets ,including eth3d,kitti, middlebury,Sence Flow(FlyingThings3D)?
If I can run the code with the middlebury and Sence Flow，because I just downloaded the two datasets according to your README so far.Thank you so much!

Answer 5 · 2018-06-10T14:09:50.000Z

@zhangguanghui1 You don't need to download all the datasets. But you should modify the scripts (train_rob_stage_one.prototxt and train_rob_stage_two.prototxt), and remove the "CustomData" layers that used to read data from other datasets. And make sure that the LMDB has successfully been generated.

Answer 6 · 2018-06-10T14:18:18.000Z

@leonzfa Ok, Thank you! I indeed found the LMDB from FlyingThings3D has not been successfully generated and I will try to debug the code according to your reply. Thank you very much!

Answer 7 · 2018-08-16T23:58:44.000Z

@leonzfa
While running the evaluation step from the README
i.e python test_rob.py model/iResNet_ROB.caffemodel get the following error and no disparity is generated:

Image size = 720896
Executing /home/sbansal/iResNet/build/tools/caffe.bin test -model tmp/deploy.prototxt -weights model/iResNet_ROB.caffemodel -iterations 1 -gpu 0 2>&1 | tee log.txt
Aborted (core dumped)
** The resulting disparity is stored in submission_results/low_res_two_view/electro_1s.pfm
Image size = 720896
Executing /home/sbansal/iResNet/build/tools/caffe.bin test -model tmp/deploy.prototxt -weights model/iResNet_ROB.caffemodel -iterations 1 -gpu 0 2>&1 | tee log.txt
Aborted (core dumped)
** The resulting disparity is stored in submission_results/low_res_two_view/forest_2s.pfm
Image size = 1032192
Executing /home/sbansal/iResNet/build/tools/caffe.bin test -model tmp/deploy.prototxt -weights model/iResNet_ROB.caffemodel -iterations 1 -gpu 0 2>&1 | tee log.txt
Aborted (core dumped)
** The resulting disparity is stored in submission_results/low_res_two_view/playground_2l.pfm
What could be the cause for this?

Answer 8 · 2018-09-11T03:02:09.000Z

@leonzfa
I meet the same problem with @Shruti-Bansal
when I run python test_rob.py model/iResNet_ROB.caffemodel to test the datasets_middlebury2014, I got:

Image size = 786432
Executing /home/w/Desktop/Reconstruction/Net-Disparity/iResNet/build/tools/caffe.bin test -model tmp/deploy.prototxt -weights model/iResNet_ROB.caffemodel -iterations 1 -gpu 0 2>&1 | tee log.txt
Aborted (core dumped)
** The resulting disparity is stored in submission_results/Middlebury2014_iResNet_ROB/testH/Djembe/disp0iResNet_ROB.pfm
Executing /home/w/Desktop/Reconstruction/Net-Disparity/iResNet/build/tools/caffe.bin test -model tmp/deploy_mirror.prototxt -weights model/iResNet_ROB.caffemodel -iterations 1 -gpu 0 2>&1 | tee log.txt
Aborted (core dumped)
** The resulting disparity is stored in submission_results/Middlebury2014_iResNet_ROB/testH/Djembe/disp0iResNet_ROB_s.pfm

And there is no pfm file generate. I try to reduce the MAX_SIZE , but it did not work.
What should I do?

Answer 9 · 2019-01-30T07:33:51.000Z

@zhangguanghui1 Have you successfully generated the lmdb of the FlyingThings3D?I don't find the way to generate its list by reshape_dataset.m.Does it need the another way to deal with it? @leonzfa
And when I remove the "CustomData" layers about "SceneFlow_train" ,this error is occured:

...
I0130 15:31:58.194545 6143 net.cpp:694] Ignoring source layer Silence_input
I0130 15:31:58.194995 6143 net.cpp:694] Ignoring source layer DummyData1
I0130 15:31:58.195000 6143 net.cpp:694] Ignoring source layer blob9_DummyData1_0_split
I0130 15:32:25.986487 6143 solver.cpp:414] Test net output #0: finaldisp_loss0 = 27.4449 (* 1 = 27.4449 loss)
I0130 15:32:25.986505 6143 solver.cpp:414] Test net output #1: flow_loss0 = 27.4476
I0130 15:32:25.986510 6143 solver.cpp:414] Test net output #2: flow_loss1 = 26.9698
I0130 15:32:25.986512 6143 solver.cpp:414] Test net output #3: flow_loss2 = 24.2479
I0130 15:32:25.986515 6143 solver.cpp:414] Test net output #4: flow_loss3 = 25.0326
I0130 15:32:25.986519 6143 solver.cpp:414] Test net output #5: flow_loss4 = 24.3454
I0130 15:32:25.986522 6143 solver.cpp:414] Test net output #6: flow_loss5 = 23.753
I0130 15:32:25.986524 6143 solver.cpp:414] Test net output #7: flow_loss6 = 12.7744
I0130 15:32:25.986529 6143 solver.cpp:414] Test net output #8: ires_disp_loss0 = 27.4681 (* 1 = 27.4681 loss)
I0130 15:32:25.986543 6143 solver.cpp:414] Test net output #9: ires_disp_loss1 = 27.077 (* 0.2 = 5.41541 loss)
I0130 15:32:25.986547 6143 solver.cpp:414] Test net output #10: ires_disp_loss2 = 25.2379 (* 0.2 = 5.04757 loss)
F0130 15:32:26.190157 6143 l1loss_layer.cu:102] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
*** Check failure stack trace: ***
@ 0x7f9f356105cd google::LogMessage::Fail()
@ 0x7f9f35612433 google::LogMessage::SendToLog()
@ 0x7f9f3561015b google::LogMessage::Flush()
@ 0x7f9f35612e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9f3607c465 caffe::L1LossLayer<>::Forward_gpu()
@ 0x7f9f35fdcfd2 caffe::Net<>::ForwardFromTo()
@ 0x7f9f35fdd0f7 caffe::Net<>::Forward()
@ 0x7f9f35d18e0f caffe::Solver<>::Step()
@ 0x7f9f35d19989 caffe::Solver<>::Solve()
@ 0x40b497 train()
@ 0x4075a8 main
@ 0x7f9f340c4830 __libc_start_main
@ 0x407d19 _start
@ (nil) (unknown)
Permutation:
15 12 42 37 8 23 46 33 11 49 39 16 26 55 56 48 7 5 29 51 3 10 44 20 28 34 59 25 19 21 30 6 32 24 9 40 13 0 2 27 54 57 52 17 1 4 43 31 22 18 47 36 41 38 53 35 58 14 45 50
Permutation:
191 173 186 37 184 144 67 138 11 110 187 192 26 194 56 48 7 198 183 188 103 197 44 77 176 133 59 121 113 149 146 124 166 82 9 126 104 107 2 143 54 93 105 156 139 4 43 74 22 130 152 66 76 100 195 99 62 14 135 50 158 153 170 73 63 88 178 150 145 57 34 136 19 97 95 190 167 108 174 109 16 70 24 98 101 38 15 20 160 168 79 128 29 114 52 129 53 180 111 118 85 12 106 3 155 175 33 131 87 189 132 84 47 72 69 40 159 1 35 196 120 116 68 181 147 6 60 86 83 71 18 193 162 81 91 45 161 21 102 117 165 36 96 27 23 177 151 125 41 122 46 30 112 89 148 171 123 65 115 64 61 179 49 42 119 127 32 185 17 75 172 13 58 134 78 94 28 137 141 31 25 157 8 92 182 154 163 169 90 51 39 164 80 199 55 142 140 10 5 0
Permutation:
86 91 42 37 8 23 67 102 11 49 75 80 26 55 56 48 7 5 92 90 103 10 44 77 28 81 59 64 72 68 30 6 32 82 9 60 104 107 2 27 54 93 105 65 1 4 43 74 22 18 47 66 76 100 96 99 62 14 45 50 40 89 58 73 63 88 36 46 21 57 34 31 19 97 95 39 41 87 78 51 16 70 24 98 101 38 15 20 61 17 79 83 29 69 52 71 53 25 84 35 85 12 106 3 13 94 33 0
Permutation:
191 173 186 37 184 144 67 138 11 110 187 192 26 194 56 48 7 198 183 188 103 197 44 77 176 133 59 121 113 149 146 124 166 82 9 126 104 107 2 143 54 93 105 156 139 4 43 74 22 130 152 66 76 100 195 99 62 14 135 50 158 153 170 73 63 88 178 150 145 57 34 136 19 97 95 190 167 108 174 109 16 70 24 98 101 38 15 20 160 168 79 128 29 114 52 129 53 180 111 118 85 12 106 3 155 175 33 131 87 189 132 84 47 72 69 40 159 1 35 196 120 116 68 181 147 6 60 86 83 71 18 193 162 81 91 45 161 21 102 117 165 36 96 27 23 177 151 125 41 122 46 30 112 89 148 171 123 65 115 64 61 179 49 42 119 127 32 185 17 75 172 13 58 134 78 94 28 137 141 31 25 157 8 92 182 154 163 169 90 51 39 164 80 199 55 142 140 10 5 0
('args:', [])
Executing /home/zhangxu/project/iResNet-master/build/tools/caffe.bin train -solver ../ROB_training/solver_rob_stage_one.prototxt -gpu 0

If you can help me with this,I will thank you very much!