AlexeyAB/darknet

DIoU and CIOU loss implementation

simon-rob opened this issue ยท 46 comments

https://github.com/Zzh-tju/DIoU-darknet
This already implemented in the about repository.

Paper: https://arxiv.org/abs/1911.08287v1

loss AP
iou loss 46.57
giou loss 47.73
diou loss 48.10
ciou loss 49.21

Looks like a really nice big "free" performance boost! Can't wait to test it!

I added CIoU and DIoU to both [yolo] and [Gaussian_yolo] layers. But didn't test yet.

@LukeAI Did you test iou_thresh=0.213 or 0.3?

if you tested. please give me the corresponding cfg file. thanks

@tuteming I fixed several bugs.

cfg-file with [Gaussian_yolo] + CIoU: yolov3-tiny_pan_ciou.cfg.txt

chart

Very interesting!! I will try to implement DIoU and CIoU in ultralytics/yolov3 and test on COCO vs our default GIoU.

Hi @AlexeyAB I'm confused about the "scale_x_y" param. In this "yolov3-tiny_pan_ciou.cfg.txt", the "scale_x_y" params are as follows:

[Gaussian_yolo]
mask = 0,1,2,3,4
anchors = 8,8, 10,13, 16,30, 33,23, 32,32, 30,61, 62,45, 64,64, 59,119, 116,90, 156,198, 373,326
scale_x_y = 1.05

[Gaussian_yolo]
mask = 4,5,6,7,8
anchors = 8,8, 10,13, 16,30, 33,23, 32,32, 30,61, 62,45, 64,64, 59,119, 116,90, 156,198, 373,326
scale_x_y = 1.1

[Gaussian_yolo]
mask = 8,9,10,11
anchors = 8,8, 10,13, 16,30, 33,23, 32,32, 30,61, 62,45, 59,119, 80,80, 116,90, 156,198, 373,326
scale_x_y = 1.2

But In your topic about "Sensitivity Effects Near Grid Boundaries (Experimental Results) ~+1 AP@[.5, [.95]" #3293, you say the rule of setting param "scale_x_y" as follows:

1.05 * logistic - 0.025 - for yolo layer (large objects) scale_x_y = 1.05 in cfg-file
1.1 * logistic - 0.05 - for yolo layer (medium objects) scale_x_y = 1.1 in cfg-file
1.2 * logistic - 0.1 - for yolo layer (small objects) scale_x_y = 1.2 in cfg-file

so my question is: for large objects, like mask = 8,9,10,11, according to the rule, the value of scale_x_y should be 1.05, it's in conflict with "yolov3-tiny_pan_ciou.cfg.txt" setting, so which one is right then?

@lq0104 Yes, my mistake )

Should be:

1.05 * logistic - 0.025 - for yolo layer (large objects) scale_x_y = 1.05 in cfg-file
1.1 * logistic - 0.05 - for yolo layer (medium objects) scale_x_y = 1.1 in cfg-file
1.2 * logistic - 0.1 - for yolo layer (small objects) scale_x_y = 1.2 in cfg-file

Or 1.1 for all yolo-layers.

I tested the 3 box regression methods below on https://github.com/ultralytics/yolov3 using yolov3-spp.cfg with swish trained on full COCO2014 to 27 epochs each, but was not able to realize performance improvements with the new methods. I'll try again with LeakyReLU(0.1). The IoU function I implemented is here:

python3 train.py --weights '' --epochs 27 --batch-size 16 --accumulate 4 --prebias --cfg cfg/yolov3s.cfg
mAP@0.5 mAP0.5:0.95 Epoch time on 2080Ti
GIoU 49.7 30.2 36min
DIoU 49.4 30.0 36min
CIoU 49.7 30.1 36min

@glenn-jocher

When using ciou - the model converges faster and accuracy increases faster during training? Or no change at all?


They also introduced new NMS function, that can be enabled in the last [yolo] layer by set

nms_kind = greedynms
beta_nms = 0.6

Mainly introduced beta1 - source code:

darknet/src/box.c

Lines 201 to 218 in b832c72

float box_diounms(box a, box b, float beta1)
{
boxabs ba = box_c(a, b);
float w = ba.right - ba.left;
float h = ba.bot - ba.top;
float c = w * w + h * h;
float iou = box_iou(a, b);
if (c == 0) {
return iou;
}
float d = (a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y);
float u = pow(d / c, beta1);
float diou_term = u;
#ifdef DEBUG_PRINTS
printf(" c: %f, u: %f, riou_term: %f\n", c, u, diou_term);
#endif
return iou - diou_term;
}


nms_kind == CORNERS_NMS is my experiment

darknet/src/box.c

Lines 855 to 917 in b832c72

void diounms_sort(detection *dets, int total, int classes, float thresh, NMS_KIND nms_kind, float beta1)
{
int i, j, k;
k = total - 1;
for (i = 0; i <= k; ++i) {
if (dets[i].objectness == 0) {
detection swap = dets[i];
dets[i] = dets[k];
dets[k] = swap;
--k;
--i;
}
}
total = k + 1;
for (k = 0; k < classes; ++k) {
for (i = 0; i < total; ++i) {
dets[i].sort_class = k;
}
qsort(dets, total, sizeof(detection), nms_comparator_v3);
for (i = 0; i < total; ++i)
{
if (dets[i].prob[k] == 0) continue;
box a = dets[i].bbox;
for (j = i + 1; j < total; ++j) {
box b = dets[j].bbox;
if (box_iou(a, b) > thresh && nms_kind == CORNERS_NMS)
{
float sum_prob = pow(dets[i].prob[k], 2) + pow(dets[j].prob[k], 2);
float alpha_prob = pow(dets[i].prob[k], 2) / sum_prob;
float beta_prob = pow(dets[j].prob[k], 2) / sum_prob;
//dets[i].bbox.x = (dets[i].bbox.x*alpha_prob + dets[j].bbox.x*beta_prob);
//dets[i].bbox.y = (dets[i].bbox.y*alpha_prob + dets[j].bbox.y*beta_prob);
//dets[i].bbox.w = (dets[i].bbox.w*alpha_prob + dets[j].bbox.w*beta_prob);
//dets[i].bbox.h = (dets[i].bbox.h*alpha_prob + dets[j].bbox.h*beta_prob);
/*
if (dets[j].points == YOLO_CENTER && (dets[i].points & dets[j].points) == 0) {
dets[i].bbox.x = (dets[i].bbox.x*alpha_prob + dets[j].bbox.x*beta_prob);
dets[i].bbox.y = (dets[i].bbox.y*alpha_prob + dets[j].bbox.y*beta_prob);
}
else if ((dets[i].points & dets[j].points) == 0) {
dets[i].bbox.w = (dets[i].bbox.w*alpha_prob + dets[j].bbox.w*beta_prob);
dets[i].bbox.h = (dets[i].bbox.h*alpha_prob + dets[j].bbox.h*beta_prob);
}
dets[i].points |= dets[j].points;
*/
dets[j].prob[k] = 0;
}
else if (box_iou(a, b) > thresh && nms_kind == GREEDY_NMS) {
dets[j].prob[k] = 0;
}
else {
if (box_diounms(a, b, beta1) > thresh && nms_kind == DIOU_NMS) {
dets[j].prob[k] = 0;
}
}
}
//if ((nms_kind == CORNERS_NMS) && (dets[i].points != (YOLO_CENTER | YOLO_LEFT_TOP | YOLO_RIGHT_BOTTOM)))
// dets[i].prob[k] = 0;
}
}
}

The 3 really didn't differ much throughout the entire training oddly enough. I did see CIoU help out more on a custom small dataset I had, so perhaps it helps more when there is less training data. The results73, 74 and 75 are G, D and CIoU.

results

Did you see any difference when using the different NMS techniques? I tried to implement soft-nms before without success.

@glenn-jocher

  • Did you check AP@0.75 and AP@0.5...0.95 ?
  • Did you check AP@0.50, AP@0.75 and AP@0.5...0.95 with confidence_threshold=0.1 instead of 0.001?

May be CIoU and DIoU improves only AP@0.75 - AP@0.95 like as GIoU: #3249 (comment)


I just tested mAP@0.5 on the default yolov3-spp.cfg / weights 608x608:
./darknet detector map cfg/coco.data cfg/yolov3-spp.cfg yolov3-spp.weights -points 101

  • default NMS: 59.28% mAP@0.5
  • nms_kind = diounms beta_nms=0.8: 59.40% mAP@0.5 so +0.12
  • nms_kind = diounms beta_nms=0.7: 59.43% mAP@0.5 so +0.15
  • nms_kind = diounms beta_nms=0.6: 59.46% mAP@0.5 so +0.18 <---
  • nms_kind = diounms beta_nms=0.5: 59.40% mAP@0.5 so +0.12
  • nms_kind = diounms beta_nms=0.4: 59.17% mAP@0.5 so -0.11

(beta_nms=0.6 by default)

May be if yolov3-spp.cfg will be trained with CIoU then nms_kind = DIOU_NMS, or if we will use lower beta then it can get better result.

CIOU hurt the mAP in my run
#4147 (comment)

@AlexeyAB
As shown in #4360 (comment)

Should be:

1.05 * logistic - 0.025 - for yolo layer (large objects) scale_x_y = 1.05 in cfg-file
1.1 * logistic - 0.05 - for yolo layer (medium objects) scale_x_y = 1.1 in cfg-file
1.2 * logistic - 0.1 - for yolo layer (small objects) scale_x_y = 1.2 in cfg-file

Or 1.1 for all yolo-layers.

But I found in Gaussian_yolov3_BDD.cfg , The value of scale_x_y is set to 1.0 for all 3 yolo-layers

Which one of the following should I set to my yolov3+Gaussian+CIoU.cfg?

  • scale_x_y = 1.05, 1.1, 1.2 for different yolo-layers ?

  • scale_x_y = 1.1 for all 3 yolo-layers ?

  • scale_x_y = 1.0 for all 3 yolo-layers ?

Or should I train 3 times and then choose the one with best mAP ?

Honestly guys I don't understand how giou, diou, ciou is better than just IOU for YOLOv3 framework. The main problem they all address compared to vanilla iou is that loss is not differential if intersection of an anchor and a target is 0. However in the yolo framework it is always non-zero - we pick the anchor with largest IOU with target. Otherwise box loss doesn't contribute to overall loss.

So I'm sceptical about these papers and reported results. At least in the experiments using my own pytorch yolo implementation with custom datasets I'm getting worse results with *IOU losses compared to original yolo loss.

@LukeAI Try to use these hyperparameters: #4430

@nyj-ocean

It isn't tested well, so use scale_x_y = 1.0 that is by default.

Or try scale_x_y = 1.05, 1.1, 1.2 for different yolo-layers ?


    1.05 * logistic - 0.025 - for yolo layer (large objects) scale_x_y = 1.05 in cfg-file
    1.1 * logistic - 0.05 - for yolo layer (medium objects) scale_x_y = 1.1 in cfg-file
    1.2 * logistic - 0.1 - for yolo layer (small objects) scale_x_y = 1.2 in cfg-file

@dselivanov

Honestly guys I don't understand how giou, diou, ciou is better than just IOU for YOLOv3 framework. The main problem they all address compared to vanilla iou is that loss is not differential if intersection of an anchor and a target is 0. However in the yolo framework it is always non-zero - we pick the anchor with largest IOU with target. Otherwise box loss doesn't contribute to overall loss.

I also have doubts about the effectiveness of C/D/GIoU. Perhaps they simply generate a higher delta and therefore improve the results of AP75 and reduce the results of AP50. Maybe we can achieve the same effect with default Yolo-loss (without GIoU) and with iou_normalizer=10

But:

  1. default Yolo-loses doesn't use IoU. So may be IoU-loss (and C/D/GIoU) is better than default Yolo-loss
  2. GIoU increases AP75 (but decreases AP50) https://github.com/WongKinYiu/CrossStagePartialNetworks#gpu-real-time-models

So I'm sceptical about these papers and reported results. At least in the experiments using my own pytorch yolo implementation with custom datasets I'm getting worse results with *IOU losses compared to original yolo loss.

This Pytorch-Yolov3 uses GIoU: https://github.com/ultralytics/yolov3

default Yolo-loses doesn't use IoU. So may be IoU-loss (and C/D/GIoU) is better than default Yolo-loss

yeah, of course I'm aware of that. Here is example of what i'm getting (without NMS):

DIoU loss

diou-loss

YOLO loss

yolo-loss

@dselivanov

I don't know how good DIoU works with rotated bboxes.
How many anchors do you use?
What Loss do you use for angle?

9 anchors. Angle loss is just another separate loss component, so total loss = box loss + angle loss + class loss + confidence loss. I predict rotation between 0 and pi. The issue here not with rotation, but with width and height...

EDIT - results below are not correct due to a bug in my code - see update here

Update from me. I'm running benchmark on a dataset with satellite images.

  • I'm detecting small objects like cars, boats, planes, etc. With angle.
  • All but loss hyper-parameters are the same during experiments.

So my conclusion

  • DIoU loss is no better than IoU (for reasons I've described here)
  • for small objects *IoU is MUCH WORSE then YOLO loss .
  • I believe *IoU is doing not so good because it is much harder to optimize *IoU loss directly (noisy gradients) compared to YOLO loss which is simple.
class ap_iou ap_diou ap_yolo
large-vehicle 0.088707 0.0940 0.4124
plane 0.174664 0.1540 0.6826
ship 0.157804 0.1779 0.5553
small-vehicle 0.048405 0.0488 0.2120
storage-tank 0.152680 0.15056 0.4130

It seems both *IoU loss tend to make object "square-ish":
IoU loss
In contrast YOLO-loss can properly optimize width and height:
YOLO loss

@dselivanov

It seems both *IoU loss tend to make object "square-ish":

It is very interesting result.

  • Are you sure that your implementation of IoU and DIoU is correct?

  • Did you try to use GIoU? Since only default-MSE and GIoU are the most tested losses on many datasets. And GIoU reliably gives a significant improvement in AP@0.5...0.95 / AP@0.75 on the MS COCO (but decreases AP@0.5)

  • Is it possible that both losses Angle-loss & DIoU-loss interfere with each other. Both compete for a network change, and DIoU loses.

@AlexeyAB @dselivanov it's important to always balance your losses so that they are of magnitude proportional to their importance to your overall solution.

Therefore, you can not simply swap MSE loss for GIoU or any other loss and simply start training. First you should train for one epoch, observe the change in loss magnitude your swap created, and then multiply it by a gain to bring it back to the same magnitude of your original implementation.

@glenn-jocher I've tried to play with coefficients and give more weight to box loss. Didn't help much.
@glenn-jocher as independent implementation, could you try to use just IoU loss to prove my idea, that for YOLO it should be not worse then other *IoU?

@dselivanov are you saying I should try to train with IoU to add to the existing G/D/CIoU table from #4360 (comment)?

Are you looking at the individual loss component magnitudes as in #4360 (comment)?

@glenn-jocher yes, I think it will be on par with other losses.

Are you looking at the individual loss component magnitudes

Yes

@nyj-ocean The basic idea is the same as in [Gaussian_yolo] layer - to use confidence score of bounded boxes: #4147

It it better than [Gaussian_yolo] layer?

@AlexeyAB

Are you sure that your implementation of IoU and DIoU is correct?

Indeed loss was correct, but I had another bug - I scaled anchors 2 times... Now re-running experiments. Will keep posted.

@AlexeyAB @glenn-jocher. I've fixed bug in my code and now it seems:

  • optimizing IoU directly clearly gives much better results compared to YOLO proxy loss
  • DIoU loss is no better than IoU
class ap_iou ap_diou ap_yolo
large-vehicle 0.470619 0.475856 0.4124
plane 0.703620 0.707864 0.6826
ship 0.629271 0.628140 0.5553
small-vehicle 0.268684 0.270890 0.2120
storage-tank 0.463676 0.467555 0.4130

@dselivanov Nice!
What about CIoU?
What activation do you use for angle-Loss?

Nice!

Btw this is ap@iou>=0.5

What about CIoU?

I haven't tried and I think will not. I believe my comment here is valid.

What activation do you use for angle-Loss?

Do you mean how loss is calculated? I predict angle between X axis and the largest side (width by convention).

@dselivanov

Do you mean how loss is calculated? I predict angle between X axis and the largest side (width by convention).

Thanks! Do you use Linear activation for Angle, or Logistic as for x,y,obj,class, or EXP as for w,h?

Thanks! Do you use Linear activation for Angle, or Logistic as for x,y,obj,class, or EXP as for w,h?

Logistic because angle is bounded by 0 and pi/2 pi. So angle = pi * sigmoid(yolo_layer_output)

@dselivanov
Have you tried using trigonometric activation functions for an angle?
Like:

  • cos, sin - min=-1 max=1
  • tanh or th (hyperbolic tangent) - min=-1 max=1 - is implemented: activation=tanh
  • sech (hyperbolic secant) - min=0 max=1

More: https://en.wikipedia.org/wiki/Hyperbolic_function

@AlexeyAB

It it better than [Gaussian_yolo] layer?

I am not sure

@AlexeyAB @glenn-jocher @dselivanov thanks for the detailed discussion which you are currently have i had few queries
1.be it IOU ,DIOU/CIOU does it take into consideration the rotation of the object i,e the orientation of the object
2. @dselivanov is proving that std IOU is the best for his application , can we generalise it to all the other domain applications also
3.can this IOU/DIOU/CIOU features be applied to other similar architectures of yolo like YOLACT, Complex yolo and complexer yolo

Thanks in advance

@abhigoku10

  1. CIoU has the best AP0.5...0.95 and AP75: https://github.com/WongKinYiu/CrossStagePartialNetworks#some-tricks-for-improving-ap

  2. @dselivanov is proving that DIoU is slightly better than IoU

  3. Yes.

@AlexeyAB thansk for your response but does these method take rotation/orientation of the object into considerations

No.
@dselivanov implemented rotation/orientation independently by himself in Pytorch.

@dselivanov can you pls share the rotation section of the CIOU/DIOU ?

@dselivanov thanks for confirming this , just wanted to knw of by including the rotation/ orientation param in the loss calculation is effective or not ?

@simon-rob @AlexeyAB @LukeAI @glenn-jocher can anyone please tell me how can I change the loss functions for training a classification model while using this repo.