giddyyupp/coco-minitrain

Are there any adjustments for training on "minitrain" compared to "train2017"?

Closed this issue a year ago · 5 comments

knn217 commented a year ago

Hello, are the models mentioned on the github page trained with the exact same methods, params, epochs,... to their original dataset? The only difference is the training set itself right? (from "train2017" to "minitrain")

I'm asking this because I'm currently training Deformable DETR on "minitrain" for my university final project, 26 epochs in (maximum is 50 epochs) and all the APs are still at 0. The model seems to peform quite well on the original dataset (I even checked with "val2017"). I kept all the config the same to the original model (I should mention that I divided "minitrain" into 5 folders, 5000 images each since google colab encounter error with drive if there are too many files in 1 folder, I think this shouldn't affect the result).

So did you guys encounter this issue as well in your training? Is this normal and I should wait until 50 epochs, or is there something that I'm supposed to change that I'm not aware of? ("minitrain" and annotations are from the shared file and val set is just "val2017")

giddyyupp commented a year ago

Hi,
We have not changed any parameters for training models using minitrain dataset. looks like there is a problem.
If you share the logs maybe we could help you to locate the problem.

knn217 commented a year ago

Sorry about this, please wait for me to retrain the model, I deleted the output files and logs to make space for the full COCO_2017 training set.

Do I need to train it back to 26 epochs? Since even at the early epochs, the APs remain at 0 and doesn't improve. Some of the ARs can improve but couldn't exceed 0.2, the rest remain at 0 like the APs

knn217 commented a year ago

Ok, 6 epochs done, here's the link to the log file:
https://drive.google.com/file/d/1-Dv4kpCgciUHr4Ig3iRYlsf6kyJgT83W/view?usp=sharing

Here's the result on val2017:

epoch 0:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003

epoch 1:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.006

epoch 2:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.005

epoch 3:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.008

epoch 4:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.002
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.003
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.007

epoch 5:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.005
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.006
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.011

Is this normal with other models?

knn217 commented a year ago

My bad, the should've checked the results in the logs since they aren't rounded.
The model seems to be converging, just really slowly, I'll mess with the learning rates to see if there are any improvements.

knn217 commented a year ago

Update on the situation:
The original Deformable DETR was trained with batch size of 32 and learning rate of 2e-4.
Since my hardware can only train with a batch size of 2, the learning rate was supposed to be (2e-4)/16 or (2e-4)/4 according to the theory (either k or sqrt(k) with k = 32/2).
I'm training with (2e-4)/16 right now, 1 epoch in and the result already looks promissing:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.010
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.038
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.003
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.015
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.013
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.027
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.076
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.097
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.087
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.150

Hope this will be helpful if anyone else run into the same issue, make sure to check the github page to see training params as well. I just went with the default params which was set to: batch size of 2 and learning rate of 2e-4