Trying to understand my results on a custom dataset
jackonealll opened this issue · 9 comments
Hi,
I am studing your shared code using a custom dataset.
Here are key points:
- I splitted my dataset in train, valid and test
- I trained for 100 epochs (I saw recommendation to run it for 50 epochs) using AnchorDETR-DC Resnet 101 weights for transfer learning.
I would like to understand why after 50 epochs the performance of my train starts get worse (overfitting?).
Run a validation code under my test dataset I got the following results using the saved weights:
Saved weights of epoch: 0049
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.599
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.847
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.637
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.516
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.624
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.632
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.677
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.703
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.616
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.727
Saved weights of epoch: 0099
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.806
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.607
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.594
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.620
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.653
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.663
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.504
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.702
Why the results on epoch 49 is better than results on epoch 99?
Is there some option to save only the best weights?
Hi, I think your custom dataset may be too small so that it is overfitting. You can early stop the training when the accuracy does not increase for several epochs.
My dataset has this setup: train (6k images), valid (200 images) and test (700 images).
I will try to increase the data augmentation on train dataset.
Thank you!
@tangjiuqi097 I am experiencing strange behavior.
As suggested I applied another data augmentation. Now I have 27k images in the train dataset.
Before I had the ratio of 1:4, now 1:20.
But, before I got a AP@0.50 of 90%. Now, after the data augmentation my AP@0.50 is 83%.
Is it possible that more images can negatively influence in the CNN training?
Below are properties used in data augmentation:
data_gen_args = dict(
brightness_range=[0.4, 1.5],
rotation_range=20,
zoom_range=0.2,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
horizontal_flip=True,
vertical_flip=True,
fill_mode='constant',
zca_whitening=True,
cval=0
)
More images should have better performance, but some data augmentations may not be suitable.
@tangjiuqi097 Thank you again!
I have one more question. Could you please explain the purpose of line #94 of engine.py file?
Line 94 in f2be216
I am asking this because I made some tests changing this line and I got stranges results. In short decreasing the IOU I am getting good results. But I think it would make my results worse.
IoU 0.50:0.95: bbox (ep 49): Line 94 commented
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.587
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.904
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.644
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.435
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.604
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.702
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.763
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.567
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.812
IoU 0.00:0.75: bbox (ep 49): Line 94 uncommented
Average Precision (AP) @[ IoU=0.00:0.75 | area= all | maxDets=100 ] = 0.872
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.00:0.75 | area= small | maxDets=100 ] = 0.250
Average Precision (AP) @[ IoU=0.00:0.75 | area=medium | maxDets=100 ] = 0.766
Average Precision (AP) @[ IoU=0.00:0.75 | area= large | maxDets=100 ] = 0.902
Average Recall (AR) @[ IoU=0.00:0.75 | area= all | maxDets= 1 ] = 0.853
Average Recall (AR) @[ IoU=0.00:0.75 | area= all | maxDets= 10 ] = 0.925
Average Recall (AR) @[ IoU=0.00:0.75 | area= all | maxDets=100 ] = 0.952
Average Recall (AR) @[ IoU=0.00:0.75 | area= small | maxDets=100 ] = 0.250
Average Recall (AR) @[ IoU=0.00:0.75 | area=medium | maxDets=100 ] = 0.866
Average Recall (AR) @[ IoU=0.00:0.75 | area= large | maxDets=100 ] = 0.976
IoU 0.5: bbox (ep 49): Line 94 uncommented [0.5]
Average Precision (AP) @[ IoU=0.50:0.50 | area= all | maxDets=100 ] = 0.904
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.50 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.50 | area=medium | maxDets=100 ] = 0.867
Average Precision (AP) @[ IoU=0.50:0.50 | area= large | maxDets=100 ] = 0.922
Average Recall (AR) @[ IoU=0.50:0.50 | area= all | maxDets= 1 ] = 0.870
Average Recall (AR) @[ IoU=0.50:0.50 | area= all | maxDets= 10 ] = 0.944
Average Recall (AR) @[ IoU=0.50:0.50 | area= all | maxDets=100 ] = 0.971
Average Recall (AR) @[ IoU=0.50:0.50 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area=medium | maxDets=100 ] = 0.904
Average Recall (AR) @[ IoU=0.50:0.50 | area= large | maxDets=100 ] = 0.992
IoU 0.4: bbox (ep 49): Line 94 uncommented [0.4]
Average Precision (AP) @[ IoU=0.30:0.30 | area= all | maxDets=100 ] = 0.943
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.892
Average Precision (AP) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.961
Average Recall (AR) @[ IoU=0.30:0.30 | area= all | maxDets= 1 ] = 0.909
Average Recall (AR) @[ IoU=0.30:0.30 | area= all | maxDets= 10 ] = 0.971
Average Recall (AR) @[ IoU=0.30:0.30 | area= all | maxDets=100 ] = 0.978
Average Recall (AR) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.921
Average Recall (AR) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.998
IoU 0.3: bbox (ep 49): Line 94 uncommented [0.3]
Average Precision (AP) @[ IoU=0.30:0.30 | area= all | maxDets=100 ] = 0.943
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.892
Average Precision (AP) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.961
Average Recall (AR) @[ IoU=0.30:0.30 | area= all | maxDets= 1 ] = 0.909
Average Recall (AR) @[ IoU=0.30:0.30 | area= all | maxDets= 10 ] = 0.971
Average Recall (AR) @[ IoU=0.30:0.30 | area= all | maxDets=100 ] = 0.978
Average Recall (AR) @[ IoU=0.30:0.30 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.30:0.30 | area=medium | maxDets=100 ] = 0.921
Average Recall (AR) @[ IoU=0.30:0.30 | area= large | maxDets=100 ] = 0.998
IoU 0.2: bbox (ep 49): Line 94 uncommented [0.2]
Average Precision (AP) @[ IoU=0.20:0.20 | area= all | maxDets=100 ] = 0.956
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.20:0.20 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.20:0.20 | area=medium | maxDets=100 ] = 0.894
Average Precision (AP) @[ IoU=0.20:0.20 | area= large | maxDets=100 ] = 0.978
Average Recall (AR) @[ IoU=0.20:0.20 | area= all | maxDets= 1 ] = 0.920
Average Recall (AR) @[ IoU=0.20:0.20 | area= all | maxDets= 10 ] = 0.972
Average Recall (AR) @[ IoU=0.20:0.20 | area= all | maxDets=100 ] = 0.981
Average Recall (AR) @[ IoU=0.20:0.20 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.20:0.20 | area=medium | maxDets=100 ] = 0.939
Average Recall (AR) @[ IoU=0.20:0.20 | area= large | maxDets=100 ] = 0.998
IoU 0.1: bbox (ep 49): Line 94 uncommented [0.1]
Average Precision (AP) @[ IoU=0.10:0.10 | area= all | maxDets=100 ] = 0.963
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.10:0.10 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.10:0.10 | area=medium | maxDets=100 ] = 0.897
Average Precision (AP) @[ IoU=0.10:0.10 | area= large | maxDets=100 ] = 0.987
Average Recall (AR) @[ IoU=0.10:0.10 | area= all | maxDets= 1 ] = 0.930
Average Recall (AR) @[ IoU=0.10:0.10 | area= all | maxDets= 10 ] = 0.977
Average Recall (AR) @[ IoU=0.10:0.10 | area= all | maxDets=100 ] = 0.983
Average Recall (AR) @[ IoU=0.10:0.10 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.10:0.10 | area=medium | maxDets=100 ] = 0.947
Average Recall (AR) @[ IoU=0.10:0.10 | area= large | maxDets=100 ] = 0.998
IoU 0.0: bbox (ep 49): Line 94 uncommented [0.0] (impossible!!!)
Average Precision (AP) @[ IoU=0.00:0.00 | area= all | maxDets=100 ] = 0.976
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.00:0.00 | area= small | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.00:0.00 | area=medium | maxDets=100 ] = 0.950
Average Precision (AP) @[ IoU=0.00:0.00 | area= large | maxDets=100 ] = 0.989
Average Recall (AR) @[ IoU=0.00:0.00 | area= all | maxDets= 1 ] = 0.946
Average Recall (AR) @[ IoU=0.00:0.00 | area= all | maxDets= 10 ] = 0.998
Average Recall (AR) @[ IoU=0.00:0.00 | area= all | maxDets=100 ] = 0.998
Average Recall (AR) @[ IoU=0.00:0.00 | area= small | maxDets=100 ] = 1.000
Average Recall (AR) @[ IoU=0.00:0.00 | area=medium | maxDets=100 ] = 1.000
Average Recall (AR) @[ IoU=0.00:0.00 | area= large | maxDets=100 ] = 0.998
@jackonealll hi,
If you use a smaller IoU threshold, the low quality prediction can be treated as true positive, so the performance can be better.
It can be used to find out more results for different IoU threshold. The default metric is to evaluate the performance for the IoU threshold in 0.5:0.95, and of course we should remove this line to obey the default coco metric.
I understand your point. But I think if I use a smaller IoU, more false positives can be found and impact negativly in the other metrics like Precision and F1-Score. In my case the results are getting better.
Bellow are other results with all metrics when changing the IoU value.
0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | |
---|---|---|---|---|---|---|
Average Precision | 0.9790000 | 0.9600000 | 0.9550000 | 0.9450000 | 0.9260000 | 0.9010000 |
Recall | 0.9796875 | 0.9746032 | 0.9650794 | 0.9470405 | 0.9479675 | 0.9268293 |
Precision | 0.9700000 | 0.9500000 | 0.9400000 | 0.9400000 | 0.9000000 | 0.8800000 |
F1-score | 0.9748197 | 0.9621443 | 0.9523746 | 0.9435071 | 0.9233612 | 0.9028078 |
Average Recall | 1.0000000 | 0.9980000 | 0.9980000 | 0.9980000 | 0.9750000 | 0.9670000 |
Also, I agree with you that we should use IoU threshold in 0.5:0.95. But in my experiments I need use a specific IoU like 0.5 this is the reason why I trying to understande this code to use the correct way.
if I use a smaller IoU, more false positives can be found
This IoU threshold is to evaluate the prediction, but not that for label assignment.
Maybe what you say is the confidence score threshold.
This issue is not active for a long time and it will be closed in 5 days. Feel free to re-open it if you have further concerns.