Should we expect to see an increase in performance when ensembling models of the same framework?
matt-sharp opened this issue · 4 comments
I have trained multiple models using exactly the same YOLO framework and parameters. I have then created an ensemble using each of the methods - affirmative, consensus and unanimous. I've done this for both 3 and 5 models.
I find the best ensemble F1-score is no better than the best score from the individual models.
Here are my results:
The YOLO framework uses some random data augmentation of saturation, exposure, hue and mosaic during training. This contributes to the variation in performance across the individual models.
Here is my config file:
I'm using the HRSC2016 dataset which contains the following:
train: 436 images including 1207 samples
valid: 181 images including 541 samples
test: 444 images including 1228 samples
Please can you help me to understand why I don't see an improvement in performance?
@ancasag thanks for your reply.
Yes, I'm calculating the metrics using the darknet map command which uses confidence = 0.25 and IOU = 0.5. I see that you use NMS threshold = 0.3 whereas darknet uses 0.45. Could this be the issue?
Sorry, I'm not sure what you mean exactly by obtaining the F1-score directly instead of using darknet?
@matt-sharp, darknet uses confidence = 0.005 for calculating mAP. see this reply from Alexey AB
i am unable ensemble model can you help with how you creating ensemble model from this github directory . i have pretrained yolo and retinanet model i want to create ensemble how to create it.