Error when training ImageNet
DrustZ opened this issue · 18 comments
When I'm training with "Inception-BN.conf", the training will terminate at 0th round, and the log is
an illegal memory access was encountered\n
an illegal memory access was encountered
And when I use "kaiming.conf", it stopped after initting all layers.
The log is
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
terminate called after throwing an instance of 'cv::Exception'
terminate called recursively
what(): /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp:323: error: (-215) 0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows in function Mat
terminate called recursively
terminate called recursively
What is the size of your input image?
Here's the .conf files:
Inception-BN.conf
data = train
iter = imgrec
# image_list = "/media/DATA1/Imagenet/train_list_shuffle.lst"
image_rec = "/media/DATA1/Imagenet/train_shuffle.bin"
image_mean = "models/mean_224.bin"
rand_crop=1
rand_mirror=1
shuffle=1
iter = threadbuffer
iter = end
eval = val
iter = imgrec
# image_list = "/media/DATA1/Imagenet/val_list.lst"
image_rec = "/media/DATA1/Imagenet/val.bin"
image_mean = "models/mean_224.bin"
#no random crop and mirror in test
iter = end
netconfig = start
layer[0->0.1] = conv:conv_1
kernel_size = 7
nchannel = 64
pad = 3
stride = 2
layer[0.1->0.2] = batch_norm:bn_1
layer[0.2->1] = relu:relu_1
layer[1->2] = max_pooling:max_pool_1
kernel_size = 3
stride = 2
layer[2->2.1] = conv:conv_2_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[2.1->2.2] = batch_norm:bn_2_1
layer[2.2->3] = relu:relu_2_1
layer[3->3.1] = conv:conv_2
kernel_size = 3
nchannel = 192
pad = 1
stride = 1
layer[3.1->3.2] = batch_norm:bn_2
layer[3.2->4] = relu:relu_2
layer[4->5] = max_pooling:max_pool_2
kernel_size = 3
stride = 2
##### inception 3a #####
layer[5->6.1.0,6.2.0,6.3.0,6.4.0] = split:split_3a_split
## inception 1x1
layer[6.1.0->6.1.1] = conv:conv_3a_1x1
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[6.1.1->6.1.2] = batch_norm:bn_3a_1x1
layer[6.1.2->6.1.3] = relu:relu_3a_1x1
## inception 3x3
layer[6.2.0->6.2.1] = conv:conv_3a_3x3_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[6.2.1->6.2.2] = batch_norm:bn_3a_3x3_reduce
layer[6.2.2->6.2.3] = relu:relu_3a_3x3_reduce
layer[6.2.3->6.2.4] = conv:conv_3a_3x3
kernel_size = 3
nchannel = 64
pad = 1
stride = 1
layer[6.2.4->6.2.5] = batch_norm:bn_3a_3x3
layer[6.2.5->6.2.6] = relu:relu_3a_3x3
## inception double 3x3
layer[6.3.0->6.3.1] = conv:conv_3a_double_3x3_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[6.3.1->6.3.2] = batch_norm:bn_3a_double_3x3_reduce
layer[6.3.2->6.3.3] = relu:relu_3a_double_3x3_reduce
layer[6.3.3->6.3.4] = conv:conv_3a_double_3x3_0
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[6.3.4->6.3.5] = batch_norm:bn_3a_double_3x3_0
layer[6.3.5->6.3.6] = relu:relu_3a_double_3x3_0
layer[6.3.6->6.3.7] = conv:conv_3a_double_3x3_1
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[6.3.7->6.3.8] = batch_norm:bn_3a_double_3x3_1
layer[6.3.8->6.3.9] = relu:relu_3a_double_3x3_1
## inception proj
layer[6.4.0->6.4.1] = avg_pooling:avg_pool_3a_pool
kernel_size = 3
stride = 1
pad = 1
layer[6.4.1->6.4.2] = conv:conv_3a_proj
kernel_size = 1
nchannel = 32
pad = 0
stride = 1
layer[6.4.2->6.4.3] = batch_norm:bn_3a_proj
layer[6.4.3->6.4.4] = relu:relu_3a_proj
layer[6.1.3,6.2.6,6.3.9,6.4.4->6] = ch_concat:ch_concat_3a_chconcat
##### inception 3b #####
layer[6->7.1.0,7.2.0,7.3.0,7.4.0] = split:split_3b_split
## inception 1x1
layer[7.1.0->7.1.1] = conv:conv_3b_1x1
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[7.1.1->7.1.2] = batch_norm:bn_3b_1x1
layer[7.1.2->7.1.3] = relu:relu_3b_1x1
## inception 3x3
layer[7.2.0->7.2.1] = conv:conv_3b_3x3_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[7.2.1->7.2.2] = batch_norm:bn_3b_3x3_reduce
layer[7.2.2->7.2.3] = relu:relu_3b_3x3_reduce
layer[7.2.3->7.2.4] = conv:conv_3b_3x3
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[7.2.4->7.2.5] = batch_norm:bn_3b_3x3
layer[7.2.5->7.2.6] = relu:relu_3b_3x3
## inception double 3x3
layer[7.3.0->7.3.1] = conv:conv_3b_double_3x3_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[7.3.1->7.3.2] = batch_norm:bn_3b_double_3x3_reduce
layer[7.3.2->7.3.3] = relu:relu_3b_double_3x3_reduce
layer[7.3.3->7.3.4] = conv:conv_3b_double_3x3_0
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[7.3.4->7.3.5] = batch_norm:bn_3b_double_3x3_0
layer[7.3.5->7.3.6] = relu:relu_3b_double_3x3_0
layer[7.3.6->7.3.7] = conv:conv_3b_double_3x3_1
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[7.3.7->7.3.8] = batch_norm:bn_3b_double_3x3_1
layer[7.3.8->7.3.9] = relu:relu_3b_double_3x3_1
## inception proj
layer[7.4.0->7.4.1] = avg_pooling:avg_pool_3b_pool
kernel_size = 3
stride = 1
pad = 1
layer[7.4.1->7.4.2] = conv:conv_3b_proj
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[7.4.2->7.4.3] = batch_norm:bn_3b_proj
layer[7.4.3->7.4.4] = relu:relu_3b_proj
layer[7.1.3,7.2.6,7.3.9,7.4.4->7] = ch_concat:ch_concat_3b_chconcat
##### inception 3c #####
layer[7->8.2.0,8.3.0,8.4.0] = split:split_3c_split
## inception 3x3
layer[8.2.0->8.2.1] = conv:conv_3c_3x3_reduce
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[8.2.1->8.2.2] = batch_norm:bn_3c_3x3_reduce
layer[8.2.2->8.2.3] = relu:relu_3c_3x3_reduce
layer[8.2.3->8.2.4] = conv:conv_3c_3x3
kernel_size = 3
nchannel = 160
pad = 1
stride = 2
layer[8.2.4->8.2.5] = batch_norm:bn_3c_3x3
layer[8.2.5->8.2.6] = relu:relu_3c_3x3
## inception double 3x3
layer[8.3.0->8.3.1] = conv:conv_3c_double_3x3_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[8.3.1->8.3.2] = batch_norm:bn_3c_double_3x3_reduce
layer[8.3.2->8.3.3] = relu:relu_3c_double_3x3_reduce
layer[8.3.3->8.3.4] = conv:conv_3c_double_3x3_0
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[8.3.4->8.3.5] = batch_norm:bn_3c_double_3x3_0
layer[8.3.5->8.3.6] = relu:relu_3c_double_3x3_0
layer[8.3.6->8.3.7] = conv:conv_3c_double_3x3_1
kernel_size = 3
nchannel = 96
pad = 1
stride = 2
layer[8.3.7->8.3.8] = batch_norm:bn_3c_double_3x3_1
layer[8.3.8->8.3.9] = relu:relu_3c_double_3x3_1
## inception proj
layer[8.4.0->8.4.1] = max_pooling:max_pool_3c_pool
kernel_size = 3
stride = 2
layer[8.2.6,8.3.9,8.4.1->8] = ch_concat:ch_concat_3c_chconcat
##### inception 4a #####
layer[8->9.1.0,9.2.0,9.3.0,9.4.0] = split:split_4a_split
## inception 1x1
layer[9.1.0->9.1.1] = conv:conv_4a_1x1
kernel_size = 1
nchannel = 224
pad = 0
stride = 1
layer[9.1.1->9.1.2] = batch_norm:bn_4a_1x1
layer[9.1.2->9.1.3] = relu:relu_4a_1x1
## inception 3x3
layer[9.2.0->9.2.1] = conv:conv_4a_3x3_reduce
kernel_size = 1
nchannel = 64
pad = 0
stride = 1
layer[9.2.1->9.2.2] = batch_norm:bn_4a_3x3_reduce
layer[9.2.2->9.2.3] = relu:relu_4a_3x3_reduce
layer[9.2.3->9.2.4] = conv:conv_4a_3x3
kernel_size = 3
nchannel = 96
pad = 1
stride = 1
layer[9.2.4->9.2.5] = batch_norm:bn_4a_3x3
layer[9.2.5->9.2.6] = relu:relu_4a_3x3
## inception double 3x3
layer[9.3.0->9.3.1] = conv:conv_4a_double_3x3_reduce
kernel_size = 1
nchannel = 96
pad = 0
stride = 1
layer[9.3.1->9.3.2] = batch_norm:bn_4a_double_3x3_reduce
layer[9.3.2->9.3.3] = relu:relu_4a_double_3x3_reduce
layer[9.3.3->9.3.4] = conv:conv_4a_double_3x3_0
kernel_size = 3
nchannel = 128
pad = 1
stride = 1
layer[9.3.4->9.3.5] = batch_norm:bn_4a_double_3x3_0
layer[9.3.5->9.3.6] = relu:relu_4a_double_3x3_0
layer[9.3.6->9.3.7] = conv:conv_4a_double_3x3_1
kernel_size = 3
nchannel = 128
pad = 1
stride = 1
layer[9.3.7->9.3.8] = batch_norm:bn_4a_double_3x3_1
layer[9.3.8->9.3.9] = relu:relu_4a_double_3x3_1
## inception proj
layer[9.4.0->9.4.1] = avg_pooling:avg_pool_4a_pool
kernel_size = 3
stride = 1
pad = 1
layer[9.4.1->9.4.2] = conv:conv_4a_proj
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[9.4.2->9.4.3] = batch_norm:bn_4a_proj
layer[9.4.3->9.4.4] = relu:relu_4a_proj
layer[9.1.3,9.2.6,9.3.9,9.4.4->9] = ch_concat:ch_concat_4a_chconcat
##### inception 4b #####
layer[9->10.1.0,10.2.0,10.3.0,10.4.0] = split:split_4b_split
## inception 1x1
layer[10.1.0->10.1.1] = conv:conv_4b_1x1
kernel_size = 1
nchannel = 192
pad = 0
stride = 1
layer[10.1.1->10.1.2] = batch_norm:bn_4b_1x1
layer[10.1.2->10.1.3] = relu:relu_4b_1x1
## inception 3x3
layer[10.2.0->10.2.1] = conv:conv_4b_3x3_reduce
kernel_size = 1
nchannel = 96
pad = 0
stride = 1
layer[10.2.1->10.2.2] = batch_norm:bn_4b_3x3_reduce
layer[10.2.2->10.2.3] = relu:relu_4b_3x3_reduce
layer[10.2.3->10.2.4] = conv:conv_4b_3x3
kernel_size = 3
nchannel = 128
pad = 1
stride = 1
layer[10.2.4->10.2.5] = batch_norm:bn_4b_3x3
layer[10.2.5->10.2.6] = relu:relu_4b_3x3
## inception double 3x3
layer[10.3.0->10.3.1] = conv:conv_4b_double_3x3_reduce
kernel_size = 1
nchannel = 96
pad = 0
stride = 1
layer[10.3.1->10.3.2] = batch_norm:bn_4b_double_3x3_reduce
layer[10.3.2->10.3.3] = relu:relu_4b_double_3x3_reduce
layer[10.3.3->10.3.4] = conv:conv_4b_double_3x3_0
kernel_size = 3
nchannel = 128
pad = 1
stride = 1
layer[10.3.4->10.3.5] = batch_norm:bn_4b_double_3x3_0
layer[10.3.5->10.3.6] = relu:relu_4b_double_3x3_0
layer[10.3.6->10.3.7] = conv:conv_4b_double_3x3_1
kernel_size = 3
nchannel = 128
pad = 1
stride = 1
layer[10.3.7->10.3.8] = batch_norm:bn_4b_double_3x3_1
layer[10.3.8->10.3.9] = relu:relu_4b_double_3x3_1
## inception proj
layer[10.4.0->10.4.1] = avg_pooling:avg_pool_4b_pool
kernel_size = 3
stride = 1
pad = 1
layer[10.4.1->10.4.2] = conv:conv_4b_proj
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[10.4.2->10.4.3] = batch_norm:bn_4b_proj
layer[10.4.3->10.4.4] = relu:relu_4b_proj
layer[10.1.3,10.2.6,10.3.9,10.4.4->10] = ch_concat:ch_concat_4b_chconcat
##### inception 4c #####
layer[10->11.1.0,11.2.0,11.3.0,11.4.0] = split:split_4c_split
## inception 1x1
layer[11.1.0->11.1.1] = conv:conv_4c_1x1
kernel_size = 1
nchannel = 160
pad = 0
stride = 1
layer[11.1.1->11.1.2] = batch_norm:bn_4c_1x1
layer[11.1.2->11.1.3] = relu:relu_4c_1x1
## inception 3x3
layer[11.2.0->11.2.1] = conv:conv_4c_3x3_reduce
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[11.2.1->11.2.2] = batch_norm:bn_4c_3x3_reduce
layer[11.2.2->11.2.3] = relu:relu_4c_3x3_reduce
layer[11.2.3->11.2.4] = conv:conv_4c_3x3
kernel_size = 3
nchannel = 160
pad = 1
stride = 1
layer[11.2.4->11.2.5] = batch_norm:bn_4c_3x3
layer[11.2.5->11.2.6] = relu:relu_4c_3x3
## inception double 3x3
layer[11.3.0->11.3.1] = conv:conv_4c_double_3x3_reduce
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[11.3.1->11.3.2] = batch_norm:bn_4c_double_3x3_reduce
layer[11.3.2->11.3.3] = relu:relu_4c_double_3x3_reduce
layer[11.3.3->11.3.4] = conv:conv_4c_double_3x3_0
kernel_size = 3
nchannel = 160
pad = 1
stride = 1
layer[11.3.4->11.3.5] = batch_norm:bn_4c_double_3x3_0
layer[11.3.5->11.3.6] = relu:relu_4c_double_3x3_0
layer[11.3.6->11.3.7] = conv:conv_4c_double_3x3_1
kernel_size = 3
nchannel = 160
pad = 1
stride = 1
layer[11.3.7->11.3.8] = batch_norm:bn_4c_double_3x3_1
layer[11.3.8->11.3.9] = relu:relu_4c_double_3x3_1
## inception proj
layer[11.4.0->11.4.1] = avg_pooling:avg_pool_4c_pool
kernel_size = 3
stride = 1
pad = 1
layer[11.4.1->11.4.2] = conv:conv_4c_proj
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[11.4.2->11.4.3] = batch_norm:bn_4c_proj
layer[11.4.3->11.4.4] = relu:relu_4c_proj
layer[11.1.3,11.2.6,11.3.9,11.4.4->11] = ch_concat:ch_concat_4c_chconcat
##### inception 4d #####
layer[11->12.1.0,12.2.0,12.3.0,12.4.0] = split:split_4d_split
## inception 1x1
layer[12.1.0->12.1.1] = conv:conv_4d_1x1
kernel_size = 1
nchannel = 96
pad = 0
stride = 1
layer[12.1.1->12.1.2] = batch_norm:bn_4d_1x1
layer[12.1.2->12.1.3] = relu:relu_4d_1x1
## inception 3x3
layer[12.2.0->12.2.1] = conv:conv_4d_3x3_reduce
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[12.2.1->12.2.2] = batch_norm:bn_4d_3x3_reduce
layer[12.2.2->12.2.3] = relu:relu_4d_3x3_reduce
layer[12.2.3->12.2.4] = conv:conv_4d_3x3
kernel_size = 3
nchannel = 192
pad = 1
stride = 1
layer[12.2.4->12.2.5] = batch_norm:bn_4d_3x3
layer[12.2.5->12.2.6] = relu:relu_4d_3x3
## inception double 3x3
layer[12.3.0->12.3.1] = conv:conv_4d_double_3x3_reduce
kernel_size = 1
nchannel = 160
pad = 0
stride = 1
layer[12.3.1->12.3.2] = batch_norm:bn_4d_double_3x3_reduce
layer[12.3.2->12.3.3] = relu:relu_4d_double_3x3_reduce
layer[12.3.3->12.3.4] = conv:conv_4d_double_3x3_0
kernel_size = 3
nchannel = 192
pad = 1
stride = 1
layer[12.3.4->12.3.5] = batch_norm:bn_4d_double_3x3_0
layer[12.3.5->12.3.6] = relu:relu_4d_double_3x3_0
layer[12.3.6->12.3.7] = conv:conv_4d_double_3x3_1
kernel_size = 3
nchannel = 192
pad = 1
stride = 1
layer[12.3.7->12.3.8] = batch_norm:bn_4d_double_3x3_1
layer[12.3.8->12.3.9] = relu:relu_4d_double_3x3_1
## inception proj
layer[12.4.0->12.4.1] = avg_pooling:avg_pool_4d_pool
kernel_size = 3
stride = 1
pad = 1
layer[12.4.1->12.4.2] = conv:conv_4d_proj
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[12.4.2->12.4.3] = batch_norm:bn_4d_proj
layer[12.4.3->12.4.4] = relu:relu_4d_proj
layer[12.1.3,12.2.6,12.3.9,12.4.4->12] = ch_concat:ch_concat_4d_chconcat
##### inception 4e #####
layer[12->13.2.0,13.3.0,13.4.0] = split:split_4e_split
## inception 3x3
layer[13.2.0->13.2.1] = conv:conv_4e_3x3_reduce
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[13.2.1->13.2.2] = batch_norm:bn_4e_3x3_reduce
layer[13.2.2->13.2.3] = relu:relu_4e_3x3_reduce
layer[13.2.3->13.2.4] = conv:conv_4e_3x3
kernel_size = 3
nchannel = 192
pad = 1
stride = 2
layer[13.2.4->13.2.5] = batch_norm:bn_4e_3x3
layer[13.2.5->13.2.6] = relu:relu_4e_3x3
## inception double 3x3
layer[13.3.0->13.3.1] = conv:conv_4e_double_3x3_reduce
kernel_size = 1
nchannel = 192
pad = 0
stride = 1
layer[13.3.1->13.3.2] = batch_norm:bn_4e_double_3x3_reduce
layer[13.3.2->13.3.3] = relu:relu_4e_double_3x3_reduce
layer[13.3.3->13.3.4] = conv:conv_4e_double_3x3_0
kernel_size = 3
nchannel = 256
pad = 1
stride = 1
layer[13.3.4->13.3.5] = batch_norm:bn_4e_double_3x3_0
layer[13.3.5->13.3.6] = relu:relu_4e_double_3x3_0
layer[13.3.6->13.3.7] = conv:conv_4e_double_3x3_1
kernel_size = 3
nchannel = 256
pad = 1
stride = 2
layer[13.3.7->13.3.8] = batch_norm:bn_4e_double_3x3_1
layer[13.3.8->13.3.9] = relu:relu_4e_double_3x3_1
## inception proj
layer[13.4.0->13.4.1] = max_pooling:max_pool_4e_pool
kernel_size = 3
stride = 2
layer[13.2.6,13.3.9,13.4.1->13] = ch_concat:ch_concat_4e_chconcat
##### inception 5a #####
layer[13->14.1.0,14.2.0,14.3.0,14.4.0] = split:split_5a_split
## inception 1x1
layer[14.1.0->14.1.1] = conv:conv_5a_1x1
kernel_size = 1
nchannel = 352
pad = 0
stride = 1
layer[14.1.1->14.1.2] = batch_norm:bn_5a_1x1
layer[14.1.2->14.1.3] = relu:relu_5a_1x1
## inception 3x3
layer[14.2.0->14.2.1] = conv:conv_5a_3x3_reduce
kernel_size = 1
nchannel = 192
pad = 0
stride = 1
layer[14.2.1->14.2.2] = batch_norm:bn_5a_3x3_reduce
layer[14.2.2->14.2.3] = relu:relu_5a_3x3_reduce
layer[14.2.3->14.2.4] = conv:conv_5a_3x3
kernel_size = 3
nchannel = 320
pad = 1
stride = 1
layer[14.2.4->14.2.5] = batch_norm:bn_5a_3x3
layer[14.2.5->14.2.6] = relu:relu_5a_3x3
## inception double 3x3
layer[14.3.0->14.3.1] = conv:conv_5a_double_3x3_reduce
kernel_size = 1
nchannel = 160
pad = 0
stride = 1
layer[14.3.1->14.3.2] = batch_norm:bn_5a_double_3x3_reduce
layer[14.3.2->14.3.3] = relu:relu_5a_double_3x3_reduce
layer[14.3.3->14.3.4] = conv:conv_5a_double_3x3_0
kernel_size = 3
nchannel = 224
pad = 1
stride = 1
layer[14.3.4->14.3.5] = batch_norm:bn_5a_double_3x3_0
layer[14.3.5->14.3.6] = relu:relu_5a_double_3x3_0
layer[14.3.6->14.3.7] = conv:conv_5a_double_3x3_1
kernel_size = 3
nchannel = 224
pad = 1
stride = 1
layer[14.3.7->14.3.8] = batch_norm:bn_5a_double_3x3_1
layer[14.3.8->14.3.9] = relu:relu_5a_double_3x3_1
## inception proj
layer[14.4.0->14.4.1] = avg_pooling:avg_pool_5a_pool
kernel_size = 3
stride = 1
pad = 1
layer[14.4.1->14.4.2] = conv:conv_5a_proj
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[14.4.2->14.4.3] = batch_norm:bn_5a_proj
layer[14.4.3->14.4.4] = relu:relu_5a_proj
layer[14.1.3,14.2.6,14.3.9,14.4.4->14] = ch_concat:ch_concat_5a_chconcat
##### inception 5b #####
layer[14->15.1.0,15.2.0,15.3.0,15.4.0] = split:split_5b_split
## inception 1x1
layer[15.1.0->15.1.1] = conv:conv_5b_1x1
kernel_size = 1
nchannel = 352
pad = 0
stride = 1
layer[15.1.1->15.1.2] = batch_norm:bn_5b_1x1
layer[15.1.2->15.1.3] = relu:relu_5b_1x1
## inception 3x3
layer[15.2.0->15.2.1] = conv:conv_5b_3x3_reduce
kernel_size = 1
nchannel = 192
pad = 0
stride = 1
layer[15.2.1->15.2.2] = batch_norm:bn_5b_3x3_reduce
layer[15.2.2->15.2.3] = relu:relu_5b_3x3_reduce
layer[15.2.3->15.2.4] = conv:conv_5b_3x3
kernel_size = 3
nchannel = 320
pad = 1
stride = 1
layer[15.2.4->15.2.5] = batch_norm:bn_5b_3x3
layer[15.2.5->15.2.6] = relu:relu_5b_3x3
## inception double 3x3
layer[15.3.0->15.3.1] = conv:conv_5b_double_3x3_reduce
kernel_size = 1
nchannel = 192
pad = 0
stride = 1
layer[15.3.1->15.3.2] = batch_norm:bn_5b_double_3x3_reduce
layer[15.3.2->15.3.3] = relu:relu_5b_double_3x3_reduce
layer[15.3.3->15.3.4] = conv:conv_5b_double_3x3_0
kernel_size = 3
nchannel = 224
pad = 1
stride = 1
layer[15.3.4->15.3.5] = batch_norm:bn_5b_double_3x3_0
layer[15.3.5->15.3.6] = relu:relu_5b_double_3x3_0
layer[15.3.6->15.3.7] = conv:conv_5b_double_3x3_1
kernel_size = 3
nchannel = 224
pad = 1
stride = 1
layer[15.3.7->15.3.8] = batch_norm:bn_5b_double_3x3_1
layer[15.3.8->15.3.9] = relu:relu_5b_double_3x3_1
## inception proj
layer[15.4.0->15.4.1] = max_pooling:max_pool_5b_pool
kernel_size = 3
stride = 1
pad = 1
layer[15.4.1->15.4.2] = conv:conv_5b_proj
kernel_size = 1
nchannel = 128
pad = 0
stride = 1
layer[15.4.2->15.4.3] = batch_norm:bn_5b_proj
layer[15.4.3->15.4.4] = relu:relu_5b_proj
layer[15.1.3,15.2.6,15.3.9,15.4.4->15] = ch_concat:ch_concat_5b_chconcat
layer[15->16] = avg_pooling:global_pool
kernel_size = 7
stride = 1
layer[+1] = flatten:flatten
layer[+1] = fullc:fc
nhidden = 1000
layer[+0] = softmax:softmax
netconfig = end
# evaluation metric
metric = rec@1
metric = rec@5
max_round = 100
num_round = 100
# input shape not including batch
input_shape = 3,224,224
batch_size = 64
update_period = 2
# global parameters in any sectiion outside netconfig, and iter
momentum = 0.9
wmat:lr = 0.05
wmat:wd = 0.0001
bias:wd = 0.000
bias:lr = 0.1
# all the learning rate schedule starts with lr
lr:schedule = constant
save_model=1
model_dir=models
print_step=1
clip_gradient = 10
# random config
random_type = xavier
# new line
dev = gpu:0-3
kaiming.conf
# Configuration for ImageNet
# Acknowledgement:
# Ref: He, Kaiming, and Jian Sun. "Convolutional Neural Networks at Constrained Time Cost." CVPR2015
# J' model in the paper above
data = train
iter = imgrec
# image_list = "/media/DATA1/Imagenet/train_list_shuffle.lst"
image_rec = "/media/DATA1/Imagenet/train_shuffle1.bin"
image_mean = "models/kmean_224.bin"
rand_crop=1
rand_mirror=1
min_crop_size=192
max_crop_size=224
max_aspect_ratio=0.3
iter = threadbuffer
iter = end
eval = val
iter = imgrec
# image_list = "/media/DATA1/Imagenet/val_list.lst"
image_rec = "/media/DATA1/Imagenet/val_shuffle.bin"
image_mean = "models/kmean_224.bin"
# no random crop and mirror in test
iter = end
###### Stage 1 #######
netconfig=start
layer[0->1] = conv:conv1
kernel_size = 7
stride = 2
nchannel = 64
layer[1->2] = relu:relu1
layer[2->3] = max_pooling
kernel_size = 3
###### Stage 2 #######
layer[3->4] = conv:conv2
nchannel = 128
kernel_size = 2
stride = 3
layer[4->5] = relu:relu2
layer[5->6] = conv:conv3
nchannel = 128
kernel_size = 2
pad = 1
layer[6->7] = relu:relu3
layer[7->8] = conv:conv4
nchannel = 128
kernel_size = 2
layer[8->9] = relu:relu4
layer[9->10] = conv:conv5
nchannel = 128
kernel_size = 2
pad = 1
layer[10->11] = relu:relu5
layer[11->12] = max_pooling:pool1
kernel_size = 3
###### Stage 3 #######
layer[12->13] = conv:conv6
nchannel = 256
kernel_size = 2
stride = 2
layer[13->14] = relu:relu6
layer[14->15] = conv:conv7
nchannel = 256
kernel_size = 2
pad = 1
layer[15->16] = relu:relu7
layer[16->17] = conv:conv8
nchannel = 256
kernel_size = 2
layer[17->18] = relu:relu8
layer[18->19] = conv:conv9
nchannel = 256
kernel_size = 2
pad = 1
layer[19->20] = relu:relu9
layer[20->21] = max_pooling:pool2
kernel_size = 3
###### Stage 4 #######
layer[21->22] = conv:conv10
nchannel = 2304
kernel_size = 2
stride = 3
layer[22->23] = relu:relu10
layer[23->24] = conv:conv11
nchannel = 256
kernel_size = 2
pad = 1
layer[24->25] = relu:relu11
###### Stage 5 #######
layer[25->26,27,28,29] = split:split1
layer[26->30] = max_pooling:pool3
kernel_size = 1
stride = 1
layer[27->31] = max_pooling:pool4
kernel_size = 2
stride = 2
layer[28->32] = max_pooling:pool5
kernel_size = 3
stride = 3
layer[29->33] = max_pooling:pool6
kernel_size = 6
stride = 6
layer[30->34] = flatten:f1
layer[31->35] = flatten:f2
layer[32->36] = flatten:f3
layer[33->37] = flatten:f4
layer[34,35,36,37->38] = concat:concat1
###### Stage 6 #######
layer[38->39] = fullc:fc1
nhidden = 4096
layer[39->40] = relu:relu12
layer[40->40] = dropout
threshold = 0.5
layer[40->41] = fullc:fc2
nhidden = 4096
layer[41->42] = relu:relu13
layer[42->42] = dropout
threshold = 0.5
layer[42->43] = fullc:fc3
nhidden = 1000
layer[43->43] = softmax:softmax1
netconfig=end
# evaluation metric
metric = rec@1
metric = rec@5
max_round = 100
num_round = 100
# input shape not including batch
input_shape = 3,224,224
batch_size = 128
# global parameters in any sectiion outside netconfig, and iter
momentum = 0.9
wmat:lr = 0.01
wmat:wd = 0.0005
bias:wd = 0.000
bias:lr = 0.02
# all the learning rate schedule starts with lr
lr:schedule = factor
lr:gamma = 0.1
lr:step = 300000
save_model=1
model_dir=models
print_step=1
# random config
random_type = xavier
# new line
dev = gpu:0-3
No, I mean your input data.
Sorry I was editing the comment format just now.
I used im2rec to make all the images resized 256*256
and the I used 2012 image resources.
Check your augmentation setting.
On Tue, Jul 21, 2015 at 20:49 张明瑞 notifications@github.com wrote:
有没有人啊?在线等,挺急的
—
Reply to this email directly or view it on GitHub
#206 (comment).
The config is the same as another one's, but with his cxxnet(executable), I can run the training successfully... I don't know what's the difference ...
we used nearly the same config.mk:
# choice of compiler
export CC = gcc
export CXX = g++
export NVCC = nvcc
# whether use CUDA during compile
USE_CUDA = 1
# add the path to CUDA libary to link and compile flag
# if you have already add them to enviroment variable, leave it as NONE
USE_CUDA_PATH = /usr/local/cuda
# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1
USE_OPENCV_DECODER = 1
# whether use CUDNN R3 library
USE_CUDNN = 1
# add the path to CUDNN libary to link and compile flag
# if you do not need that, or do not have that, leave it as NONE
USE_CUDNN_PATH = /home/mrzhang/Downloads/cudnn-6.5-linux-x64-v2
# whether to build caffe converter
USE_CAFFE_CONVERTER = 0
CAFFE_ROOT =
CAFFE_INCLUDE =
CAFFE_LIB =
#
# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
USE_STATIC_MKL = /opt/intel/composer_xe_2015.0.090
USE_BLAS = mkl
#
# add path to intel libary, you may need it
# for MKL, if you did not add the path to enviroment variable
#
USE_INTEL_PATH = /opt/intel
# whether compile with parameter server
USE_DIST_PS = 1
PS_PATH = ./ps-lite
PS_THIRD_PATH = NONE
# whether compile with rabit
USE_RABIT_PS = 0
RABIT_PATH = /home/mrzhang/Downloads/rabit
# use openmp iterator
USE_OPENMP_ITER = 1
# the additional link flags you want to add
ADD_LDFLAGS = -ljpeg
# the additional compile flags you want to add
ADD_CFLAGS = -I /usr/local/cuda/bin
#
# If use MKL, choose static link automaticly to fix python wrapper
#
ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
endif
#------------------------
# configuration for DMLC
#------------------------
# whether use HDFS support during compile
# this will allow cxxnet to directly save/load model from hdfs
USE_HDFS = 0
# whether use AWS S3 support during compile
# this will allow cxxnet to directly save/load model from s3
USE_S3 = 0
# path to libjvm.so
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
USE_GLOG = 1
USE_GLOG = 1
Sorry it still can't work, but using other's bin seems to execute well.
Maybe the Opencv has something wrong.
When I run ../../bin/cxxnet bowl.conf, I also face the same problem.Can anyone solve this problem?
Use CUDA Device 0: GeForce GTX 970
finish initialization with 1 devices
Initializing layer: 0
Initializing layer: 1
Initializing layer: 2
Initializing layer: 3
Initializing layer: 4
Initializing layer: 5
Initializing layer: 6
Initializing layer: 7
Initializing layer: 8
Initializing layer: 9
Initializing layer: 10
Initializing layer: 11
Initializing layer: 12
Initializing layer: 13
Initializing layer: 14
Initializing layer: 15
Initializing layer: 16
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
SGDUpdater: eta=0.001000, mom=0.900000
SGDUpdater: eta=0.002000, mom=0.900000
node[in].shape: 64,3,40,40
node[!node-after-0].shape: 64,48,41,41
node[!node-after-1].shape: 64,48,41,41
node[!node-after-2].shape: 64,48,20,20
node[!node-after-3].shape: 64,96,20,20
node[!node-after-4].shape: 64,96,20,20
node[!node-after-5].shape: 64,96,20,20
node[!node-after-6].shape: 64,96,20,20
node[!node-after-7].shape: 64,96,10,10
node[!node-after-8].shape: 64,128,9,9
node[!node-after-9].shape: 64,128,9,9
node[!node-after-10].shape: 64,128,7,7
node[!node-after-11].shape: 64,128,3,3
node[!node-after-12].shape: 64,1,1,1152
node[!node-after-13].shape: 64,1,1,256
node[!node-after-14].shape: 64,1,1,121
[17:14:09] src/io/iter_image_recordio-inl.hpp:68: Loaded ImageList from /home/meitu/cxxnet-master/example/kaggle_bowl/tr.lst 20000 Image records
cannot find /home/meitu/cxxnet-master/example/kaggle_bowl/models/image_mean.bin: create mean image, this will take some time...
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
terminate called recursively
terminate called after throwing an instance of 'cv::Exception'
Aborted (core dumped)
@weihaoxie can you paste head 5 lines of your tr.list? maybe there is a path problem
ours are the latest version
@DrustZ That means your friend share same code version with you but his bin can works at your condition? OK ...
the head 5 lines of tr.lst as follows. Is it wrong?
3406 10 data/train/chaetognath_non_sagitta/119995.jpg
22212 90 data/train/radiolarian_chain/89297.jpg
19772 83 data/train/protist_fuzzy_olive/48175.jpg
23435 98 data/train/siphonophore_calycophoran_rocketship_young/103178.jpg
18710 72 data/train/hydromedusae_solmaris/80365.jpg
well, it seems there's a hidden bug in cxxnet
I rebuild the cxxnet and used different version OPENCV from 2.4.8 , 2.4.9 to 3.0.0, which proved vain.
I tried generating different test datas, and every bin I used produced the same problem,
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
It's frustrating, which ruined my whole week trying to get over it , god.
FIND BUG:
In the lateset version, the io system seems somewhat crashed.
Git Blame:
commit fix rand_crop #197
in src/io/image_augmenter-inl.hpp :
from line110 to 140.
- line 123 :
cv::Rect roi(x, y, rand_crop_size, rand_crop_size);
- line 139 :
cv::Rect roi(x, y, shape_[1], shape_[2]);
please check the file.
Add: before merge files , please at least make the simplest test @superzrx
@DrustZ
Yes there is a bug when using min_crop_size or max_crop_size for there is a duplicated line.
res = res(roi);
Thank you.
@weihaoxie just fix, maybe solved