Train with dota-train-dataset（1024，14384files），the mAP on dota-val-dataset is 70.84

Question

Train with dota-train-dataset（1024，14384files），the mAP on dota-val-dataset is 70.84

Closed this issue 3 years ago · 5 comments

hukaixuan19970627 commented 3 years ago

Thank you for your code, I'm learning how to use it, but I've had some problems and hope to get your help.
config: orientedreppoints_r50_demo.py
changes:
img_per_gpu=2 -> img_per_gpu=4
workers_per_gpu=2 -> workers_per_gpu=4
lr=0.01 -> lr=0.005
environment: 2 gpu(Tesla P40)
about mAP on val: 70.84.
classaps:[89.43 73.79 40.19 66.33 73.53 82.06 88.16 90.86 60.59 86.46 65.51 64.86 71.29 57.60 51.94 ]
my question: I use your checkpoints（form trainval-dataset） to detect dota-val-dataset and the mAP is about 82.
But the mAP 70.84(checkpoints form train-dota-dataset, test on val) feels lower than I expected(73 ~ 75). Is this normal?

Answer 1 · 2021-06-14T09:25:00.000Z

For training on the train dataset，evaluation on the val dataset. My results can gain the mAP：73.37447
class APs: [89.89954584 75.09381718 51.91760568 69.30359075 75.60788996 82.47240929
88.02548317 90.72148874 66.22466264 87.10500443 69.58421786 68.80032583
72.45845151 61.51307246 51.88949827].
My trained model is here (password: aabb). You can try it.

I guess that your results are resulted by these three aspects:

My train set include 15749 files, subsize=1024 x1024, gap=200. The number of your files is less than it. My script is prepare_dota1_train_val.py to prepare the train and val dataset, and you can refer to it.
The learning rate is a sensitive factor for the model training. My device environment is as follow: 8 RTX2080ti, 2 imgs per gpu.
You can try the learning rate of 0.006, 0.008.
You can also add the “RandomRotate”in the config to get a better mAP, as following：
dict(type='RandomRotate', rate=0.5, angles=[30, 60, 90, 120, 150], auto_bound=False)

If you have any questions for this problem, please let me know. I'll try to help you to get the normal results.

Answer 2 · 2021-06-14T10:59:46.000Z

@hukaixuan19970627

Answer 3 · 2021-06-14T13:39:42.000Z

Yeah，the learning rate does have a significant impact on results. I got the mAP65 when the environment is 2 Tesla P40，4 imgs per gpu，lr=0.01(train on train-dota-dataset, test on val-dota-dataset).
My train-dota-dataset include 14384files（subsize=1024×1024， gap=100），maybe that's what makes the difference in results.

Answer 4 · 2021-06-19T08:58:46.000Z

Have you tried mixed precision training?
I add ‘fp16 = dict(loss_scale=512.)’ to the config file, but the mAP is just 4.78.
btw: The mAP is 74.98 with same config file, FP32 training.

Answer 5 · 2021-06-19T14:56:16.000Z

I haven't tried the mixed precision training to train this model.
As far as I know, Tesla P40 may not support FP16.
Besides, with a supportable GPU, the loss_scale=512 is used to adjust the magnification scale of loss and gradient during the training. The appropriate range is 0-1000. I guess that the model parameters have not been updated because the gradient is too small with fp16. Maybe a larger value will get a better result.