This is an unofficial trial applying Centerloss to SSD multibox_loss function
Technical details are in the paper: A Discriminative Feature Learning Approach for Deep Face Recognition https://pan.baidu.com/s/1up_PWpR85HqVe10yhFzHoQ
SSD(Single Shot MultiBox Detector) implements the multibox_loss function in the https://github.com/weiliu89/caffe/tree/ssd. We can read the loss function through the coding multibox_loss_layer.h/multibox_loss_layer.cpp
When detecting objects on the image, we often employ, including SSD, softmax function to classify the object and L1 regression to localize the object.
In the equation above,
However, for some objects that are similar to each other, learning the location information may be easy. The softmax function is hard to work due to the similarity of feature of foreground samples. Center loss can effectively decrease the feature difference between the same object.
Notation ~~ is the root path of your caffe-ssd dir
cp center_loss_layer.cpp ~~/caffe-ssd/src/caffe/layers/
cp center_loss_layer.h ~~/caffe-ssd/include/caffe/layers/
cp multibox_center_loss_layer.cpp ~~/caffe-ssd/src/caffe/layers/
cp multibox_center_loss_layer.hpp ~~/caffe-ssd/include/caffe/layers/
message CenterLossParameter {
optional uint32 num_output = 1; // The number of outputs for the layer
optional FillerParameter center_filler = 2; // The filler for the centers
// The first axis to be lumped into a single inner product computation;
// all preceding axes are retained in the output.
// May be negative to index from the end (e.g., -1 for the last axis).
optional int32 axis = 3 [default = 1];
}
message MultiBoxCenterLossParameter{
//center_features represents the length of features that is equal to the length of object centers in each default box.
optional uint32 center_features = 1;
}
Adding in the message LayerParameter
optional MultiBoxCenterLossParameter multibox_center_loss_param = 211;//this value should be the only in this message
optional CenterLossParameter center_loss_param = 149;
For an instance, fc7_norm layer has 4 anchors, including aspect ratio = sqrt(2),1,1/2,2. Each anchor has 16 center_features. Therefore the num_output is 64.
layer {
name: "fc7_norm_center_mbox_conf_new"
type: "Convolution"
bottom: "fc7_norm"
top: "fc7_norm_center_mbox_conf"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "fc7_norm_center_mbox_conf_perm"
type: "Permute"
bottom: "fc7_norm_center_mbox_conf"
top: "fc7_norm_center_mbox_conf_perm"
permute_param {
order: 0
order: 2
order: 3
order: 1
}
}
layer {
name: "fc7_norm_mbox_center_conf_flat"
type: "Flatten"
bottom: "fc7_norm_center_mbox_conf_perm"
top: "fc7_norm_mbox_center_conf_flat"
flatten_param {
axis: 1
}
}
layer {
name: "mbox_loss"
type: "MultiBoxCenterLoss" # the type was changed
bottom: "mbox_loc"
bottom: "mbox_conf"
bottom: "mbox_priorbox"
bottom: "label"
bottom: "mbox_center_conf" #mbox_center_conf is the concatenation of all the center_features in all default box.
top: "mbox_loss"
include {
phase: TRAIN
}
propagate_down: true
propagate_down: true
propagate_down: false
propagate_down: false
propagate_down: true #center_features layers need backward.
loss_param {
normalization: VALID
}
multibox_loss_param {
loc_loss_type: SMOOTH_L1
conf_loss_type: SOFTMAX
loc_weight: 1
num_classes: 21
share_location: true
match_type: PER_PREDICTION
overlap_threshold: 0.2
use_prior_for_matching: true
background_label_id: 0
use_difficult_gt: true
neg_pos_ratio: 3
neg_overlap: 0.1
code_type: CENTER_SIZE
ignore_cross_boundary_bbox: false
mining_type: MAX_NEGATIVE
}
multibox_center_loss_param {
center_features: 16 # center_features represents the length of features that is equal to the length of object centers in each default box.
}
}