Trying to understand my results on a custom dataset

Hi,

I am studing your shared code using a custom dataset.

Here are key points:

I splitted my dataset in train, valid and test
I trained for 100 epochs (I saw recommendation to run it for 50 epochs) using AnchorDETR-DC Resnet 101 weights for transfer learning.

I would like to understand why after 50 epochs the performance of my train starts get worse (overfitting?).

Here are the graph:

Run a validation code under my test dataset I got the following results using the saved weights:

Saved weights of epoch: 0049
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.599
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.847
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.637
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.516
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.624
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.632
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.677
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.703
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.616
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.727

Saved weights of epoch: 0099
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.806
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.607
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.426
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.594
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.620
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.653
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.663
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.702

Why the results on epoch 49 is better than results on epoch 99?
Is there some option to save only the best weights?

Hi, I think your custom dataset may be too small so that it is overfitting. You can early stop the training when the accuracy does not increase for several epochs.

My dataset has this setup: train (6k images), valid (200 images) and test (700 images).

I will try to increase the data augmentation on train dataset.

Thank you!

@tangjiuqi097 I am experiencing strange behavior.

As suggested I applied another data augmentation. Now I have 27k images in the train dataset.

Before I had the ratio of 1:4, now 1:20.

But, before I got a AP@0.50 of 90%. Now, after the data augmentation my AP@0.50 is 83%.

Is it possible that more images can negatively influence in the CNN training?

Below are properties used in data augmentation:

data_gen_args = dict(
	brightness_range=[0.4, 1.5],
	rotation_range=20,
	zoom_range=0.2,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.2,
	horizontal_flip=True,
	vertical_flip=True,	
	fill_mode='constant',
	zca_whitening=True,
	cval=0
)

More images should have better performance, but some data augmentations may not be suitable.

@tangjiuqi097 Thank you again!

I have one more question. Could you please explain the purpose of line #94 of engine.py file?

AnchorDETR/engine.py

Line 94 in f2be216

# coco_evaluator.coco_eval[iou_types[0]].params.iouThrs = [0, 0.1, 0.5, 0.75]

I am asking this because I made some tests changing this line and I got stranges results. In short decreasing the IOU I am getting good results. But I think it would make my results worse.

IoU 0.50:0.95: bbox (ep 49): Line 94 commented
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.587
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.904
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.644
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.435
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.604
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.702
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.763
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.567
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.812
 
IoU 0.00:0.75: bbox (ep 49): Line 94 uncommented
 Average Precision  (AP) @[ IoU=0.00:0.75 | area=   all | maxDets=100 ] = 0.872
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.00:0.75 | area= small | maxDets=100 ] = 0.250
 Average Precision  (AP) @[ IoU=0.00:0.75 | area=medium | maxDets=100 ] = 0.766
 Average Precision  (AP) @[ IoU=0.00:0.75 | area= large | maxDets=100 ] = 0.902
 Average Recall     (AR) @[ IoU=0.00:0.75 | area=   all | maxDets=  1 ] = 0.853
 Average Recall     (AR) @[ IoU=0.00:0.75 | area=   all | maxDets= 10 ] = 0.925
 Average Recall     (AR) @[ IoU=0.00:0.75 | area=   all | maxDets=100 ] = 0.952
 Average Recall     (AR) @[ IoU=0.00:0.75 | area= small | maxDets=100 ] = 0.250
 Average Recall     (AR) @[ IoU=0.00:0.75 | area=medium | maxDets=100 ] = 0.866
 Average Recall     (AR) @[ IoU=0.00:0.75 | area= large | maxDets=100 ] = 0.976
 
IoU 0.5: bbox (ep 49): Line 94 uncommented [0.5]
 Average Precision  (AP) @[ IoU=0.50:0.50 | area=   all | maxDets=100 ] = 0.904
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.50 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.50 | area=medium | maxDets=100 ] = 0.867
 Average Precision  (AP) @[ IoU=0.50:0.50 | area= large | maxDets=100 ] = 0.922
 Average Recall     (AR) @[ IoU=0.50:0.50 | area=   all | maxDets=  1 ] = 0.870
 Average Recall     (AR) @[ IoU=0.50:0.50 | area=   all | maxDets= 10 ] = 0.944
 Average Recall     (AR) @[ IoU=0.50:0.50 | area=   all | maxDets=100 ] = 0.971
 Average Recall     (AR) @[ IoU=0.50:0.50 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.50 | area=medium | maxDets=100 ] = 0.904
 Average Recall     (AR) @[ IoU=0.50:0.50 | area= large | maxDets=100 ] = 0.992
 
IoU 0.4: bbox (ep 49): Line 94 uncommented [0.4]
 Average Precision  (AP) @[ IoU=0.30:0.30 | area=   all | maxDets=100 ] = 0.943
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.892
 Average Precision  (AP) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.961
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=   all | maxDets=  1 ] = 0.909
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=   all | maxDets= 10 ] = 0.971
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=   all | maxDets=100 ] = 0.978
 Average Recall     (AR) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.921
 Average Recall     (AR) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.998
 
IoU 0.3: bbox (ep 49): Line 94 uncommented [0.3]
 Average Precision  (AP) @[ IoU=0.30:0.30 | area=   all | maxDets=100 ] = 0.943
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.892
 Average Precision  (AP) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.961
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=   all | maxDets=  1 ] = 0.909
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=   all | maxDets= 10 ] = 0.971
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=   all | maxDets=100 ] = 0.978
 Average Recall     (AR) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.921
 Average Recall     (AR) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.998
 
IoU 0.2: bbox (ep 49): Line 94 uncommented [0.2]
 Average Precision  (AP) @[ IoU=0.20:0.20 | area=   all | maxDets=100 ] = 0.956
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.20:0.20 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.20:0.20 | area=medium | maxDets=100 ] = 0.894
 Average Precision  (AP) @[ IoU=0.20:0.20 | area= large | maxDets=100 ] = 0.978
 Average Recall     (AR) @[ IoU=0.20:0.20 | area=   all | maxDets=  1 ] = 0.920
 Average Recall     (AR) @[ IoU=0.20:0.20 | area=   all | maxDets= 10 ] = 0.972
 Average Recall     (AR) @[ IoU=0.20:0.20 | area=   all | maxDets=100 ] = 0.981
 Average Recall     (AR) @[ IoU=0.20:0.20 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.20:0.20 | area=medium | maxDets=100 ] = 0.939
 Average Recall     (AR) @[ IoU=0.20:0.20 | area= large | maxDets=100 ] = 0.998
 
IoU 0.1: bbox (ep 49): Line 94 uncommented [0.1]
 Average Precision  (AP) @[ IoU=0.10:0.10 | area=   all | maxDets=100 ] = 0.963
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.10:0.10 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.10:0.10 | area=medium | maxDets=100 ] = 0.897
 Average Precision  (AP) @[ IoU=0.10:0.10 | area= large | maxDets=100 ] = 0.987
 Average Recall     (AR) @[ IoU=0.10:0.10 | area=   all | maxDets=  1 ] = 0.930
 Average Recall     (AR) @[ IoU=0.10:0.10 | area=   all | maxDets= 10 ] = 0.977
 Average Recall     (AR) @[ IoU=0.10:0.10 | area=   all | maxDets=100 ] = 0.983
 Average Recall     (AR) @[ IoU=0.10:0.10 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.10:0.10 | area=medium | maxDets=100 ] = 0.947
 Average Recall     (AR) @[ IoU=0.10:0.10 | area= large | maxDets=100 ] = 0.998
 
IoU 0.0: bbox (ep 49): Line 94 uncommented [0.0] (impossible!!!)
 Average Precision  (AP) @[ IoU=0.00:0.00 | area=   all | maxDets=100 ] = 0.976
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.00:0.00 | area= small | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.00:0.00 | area=medium | maxDets=100 ] = 0.950
 Average Precision  (AP) @[ IoU=0.00:0.00 | area= large | maxDets=100 ] = 0.989
 Average Recall     (AR) @[ IoU=0.00:0.00 | area=   all | maxDets=  1 ] = 0.946
 Average Recall     (AR) @[ IoU=0.00:0.00 | area=   all | maxDets= 10 ] = 0.998
 Average Recall     (AR) @[ IoU=0.00:0.00 | area=   all | maxDets=100 ] = 0.998
 Average Recall     (AR) @[ IoU=0.00:0.00 | area= small | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.00:0.00 | area=medium | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.00:0.00 | area= large | maxDets=100 ] = 0.998

@jackonealll hi,
If you use a smaller IoU threshold, the low quality prediction can be treated as true positive, so the performance can be better.
It can be used to find out more results for different IoU threshold. The default metric is to evaluate the performance for the IoU threshold in 0.5:0.95, and of course we should remove this line to obey the default coco metric.

Hi @tangjiuqi097

I understand your point. But I think if I use a smaller IoU, more false positives can be found and impact negativly in the other metrics like Precision and F1-Score. In my case the results are getting better.

Bellow are other results with all metrics when changing the IoU value.

	0	0.1	0.2	0.3	0.4	0.5
Average Precision	0.9790000	0.9600000	0.9550000	0.9450000	0.9260000	0.9010000
Recall	0.9796875	0.9746032	0.9650794	0.9470405	0.9479675	0.9268293
Precision	0.9700000	0.9500000	0.9400000	0.9400000	0.9000000	0.8800000
F1-score	0.9748197	0.9621443	0.9523746	0.9435071	0.9233612	0.9028078
Average Recall	1.0000000	0.9980000	0.9980000	0.9980000	0.9750000	0.9670000

Also, I agree with you that we should use IoU threshold in 0.5:0.95. But in my experiments I need use a specific IoU like 0.5 this is the reason why I trying to understande this code to use the correct way.

if I use a smaller IoU, more false positives can be found

This IoU threshold is to evaluate the prediction, but not that for label assignment.

Maybe what you say is the confidence score threshold.

This issue is not active for a long time and it will be closed in 5 days. Feel free to re-open it if you have further concerns.