The code is unofficial version for focal loss for Dense Object Detection
.
https://arxiv.org/abs/1708.02002
this is implementtd using mxnet python layer.
Assue that you have put the focal_loss.py in your operator path
you can use:
from your_operators.focal_loss import *
cls_prob = mx.sym.Custom(op_type='FocalLoss', name = 'cls_prob', data = cls_score, labels = label, alpha =0.25, gamma= 2)
this is my experiments on kitti 10 cls, the performance on hard cls is great!!
method@0.7 | car | van | Truck | cyclist | pedestrian | person_sitting | tram | misc | dontcare |
---|---|---|---|---|---|---|---|---|---|
base line(faster rcnn + ohem(1:2)) | 0.7892 | 0.7462 | 0.8465 | 0.623 | 0.4254 | 0.1374 | 0.5035 | 0.5007 | 0.1329 |
faster rcnn + focal loss with softmax | 0.797 | 0.874 | 0.8959 | 0.7914 | 0.5700 | 0.2806 | 0.7884 | 0.7052 | 0.1433 |
in my experiment, i have to use the strategy in paper section 3.3
.
LIKE:
Uder such an initialization, in the presence of class imbalance, the loss due to the frequent class can dominate total loss and cause instability in early training.
##AND YOU CAN TRY MY INSTEAD STRATEGY:
train the model using the classical softmax for several times (for examples 3 in kitti dataset)
choose a litti learning rate:
and the traing loss will work well:
focal loss value is not used in focal_loss.py, becayse we should forward the cls_pro in this layer, the major task of focal_loss.py is to backward the focal loss gradient.
the focal loss vale should be calculated in metric.py and use normalization in it.
and this layer is not support use_ignore
for example :
class RCNNLogLossMetric(mx.metric.EvalMetric):
def __init__(self, cfg):
super(RCNNLogLossMetric, self).__init__('RCNNLogLoss')
self.e2e = cfg.TRAIN.END2END
self.ohem = cfg.TRAIN.ENABLE_OHEM
self.pred, self.label = get_rcnn_names(cfg)
def update(self, labels, preds):
pred = preds[self.pred.index('rcnn_cls_prob')]
if self.ohem or self.e2e:
label = preds[self.pred.index('rcnn_label')]
else:
label = labels[self.label.index('rcnn_label')]
last_dim = pred.shape[-1]
pred = pred.asnumpy().reshape(-1, last_dim)
label = label.asnumpy().reshape(-1,).astype('int32')
# filter with keep_inds
keep_inds = np.where(label != -1)[0]
label = label[keep_inds]
cls = pred[keep_inds, label]
cls += 1e-14
gamma = 2
alpha = 0.25
cls_loss = alpha*(-1.0 * np.power(1 - cls, gamma) * np.log(cls))
cls_loss = np.sum(cls_loss)/len(label)
#print cls_loss
self.sum_metric += cls_loss
self.num_inst += label.shape[0]
you can check the gradient value in your debug(if need). By the way
this is my derivation about backward, if it has mistake, please note to me.