dmlc/cxxnet

CNN 训练问题

supengyu opened this issue · 0 comments

使用cxxnetwindows版本训练cifar10图片库,训练过程中发现错误率一直很高,没有明显降低,如下:
initializing end, start working
round 0:[ 300] 232 sec elapsed[1] train-error:0.898737 train-re
c@1:0.101263 train-rec@5:0.497522 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 1:[ 300] 564 sec elapsed[2] train-error:0.899077 train-re
c@1:0.100923 train-rec@5:0.4998 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 2:[ 300] 898 sec elapsed[3] train-error:0.902214 train-re
c@1:0.0977861 train-rec@5:0.498302 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 3:[ 300] 1223 sec elapsed[4] train-error:0.899636 train-re
c@1:0.100364 train-rec@5:0.501139 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 4:[ 300] 1545 sec elapsed[5] train-error:0.901954 train-re
c@1:0.0980459 train-rec@5:0.4999 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 5:[ 300] 1868 sec elapsed[6] train-error:0.900595 train-re
c@1:0.0994046 train-rec@5:0.498122 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 6:[ 300] 2187 sec elapsed[7] train-error:0.897818 train-re
c@1:0.102182 train-rec@5:0.500939 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5
round 7:[ 300] 2506 sec elapsed[8] train-error:0.899756 train-re
c@1:0.100244 train-rec@5:0.502218 test-error:0.9 test-rec@1:0.1 test-rec
@5:0.5

训练配置文件如下:
data = train
iter = imgrec

image_list = "../../NameList.train"

image_rec = "E:/deepLearning/cxxnet-master/bin/data/cifar10train.bin"
image_root = "E:/deepLearning/cifar-10/cifar-10-py-colmajor/train_batch/"
image_mean = "E:/deepLearning/cxxnet-master/bin/models/image_net_mean.bin"
rand_crop=1
rand_mirror=1
iter = threadbuffer
iter = end

eval = test
iter = imgrec

image_list = "../../NameList.test"

image_rec = "E:/deepLearning/cxxnet-master/bin/data/cifar10test.bin"
image_root = "E:/deepLearning/cifar-10/cifar-10-py-colmajor/test_batch/"
image_mean = "E:/deepLearning/cxxnet-master/bin/models/image_net_mean.bin"

no random crop and mirror in test

iter = end

netconfig=start
layer[0->1] = conv:conv1
kernel_size = 5
stride = 1
nchannel = 64
layer[1->2] = relu:relu1
layer[2->3] = max_pooling:pool1
kernel_size = 3
stride = 2
layer[3->4] = lrn:lrn1
local_size = 5
alpha = 0.0001
beta = 0.75
knorm = 1

layer[4->5] = conv:conv2
ngroup = 1
nchannel = 64
kernel_size = 5
pad = 1
layer[5->6] = relu:relu2
layer[6->7] = max_pooling:pool2
kernel_size = 3
stride = 2
layer[7->8] = lrn:lrn2
local_size = 5
alpha = 0.0001
beta = 0.75
knorm = 1

layer[8->9] = conv:conv3
nchannel = 128
kernel_size = 3
pad = 1
layer[9->10]= relu:relu3
layer[10->11] = max_pooling:pool3
kernel_size = 3
stride = 2
layer[11->12] = flatten:flatten1
layer[12->13] = fullc:fc4
nhidden = 1024
init_sigma = 0.005
init_bias = 1.0
layer[13->14] = relu:relu4
layer[14->14] = dropout:dropout1
threshold = 0.5
layer[14->15] = fullc:fc5
nhidden = 10
layer[15->15] = softmax:softmax1

netconfig=end

evaluation metric

metric = error
metric = rec@1
metric = rec@5

max_round = 45
num_round = 45

input shape not including batch

input_shape = 3,32,32

batch_size = 128

global parameters in any sectiion outside netconfig, and iter

momentum = 0.9
wmat:lr = 0.01
wmat:wd = 0.0005

bias:wd = 0.000
bias:lr = 0.02

all the learning rate schedule starts with lr

lr:schedule = expdecay
lr:gamma = 0.1
lr:step = 100000

save_model=1
model_dir=models

random config

random_type = xavier

new line

请问一下可能的原因,谢谢