sming256/OpenTAD

Challenges in replicating Adatad

Closed this issue · 2 comments

Hi!

I found that when I reproduced adatad on the THUMOS dataset, the mAP I got was always significantly different from what was presented in the paper. I will provide my training process and parameters.

I ran the command:
torchrun --nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 tools/train.py configs/adatad/thumos/e2e_thumos_videomae_s_768x1_160_adapter.py

After completing 60 epochs of training, the result was:

2024-09-28 16:37:35 Train INFO: Evaluation starts...
2024-09-28 16:37:53 Train INFO: Loaded annotations from validation subset.
2024-09-28 16:37:53 Train INFO: Number of ground truth instances: 3325
2024-09-28 16:37:53 Train INFO: Number of predictions: 422000
2024-09-28 16:37:53 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-09-28 16:37:53 Train INFO: Average-mAP: 60.30 (%)
2024-09-28 16:37:53 Train INFO: mAP at tIoU 0.30 is 79.42%
2024-09-28 16:37:53 Train INFO: mAP at tIoU 0.40 is 72.74%
2024-09-28 16:37:53 Train INFO: mAP at tIoU 0.50 is 62.65%
2024-09-28 16:37:53 Train INFO: mAP at tIoU 0.60 is 50.92%
2024-09-28 16:37:53 Train INFO: mAP at tIoU 0.70 is 35.77%
2024-09-28 16:37:53 Train INFO: Training Over...

The displayed ave. mAP of 60.30 is significantly different from the 69.03 provided in the paper. However, I made some changes to the parameters in the configuration file. I will list all the changes I made:

  1. I adjusted train=dict(batch_size=2, num_workers=2),val=dict(batch_size=2, num_workers=2) to train=dict(batch_size=8, num_workers=11),val=dict(batch_size=2, num_workers=3). I found that this allows me to maximize the utilization of the machine while ensuring training stability.

  2. I adjusted the learning rate of the backbone in the optimizer from 1e-4 to 2e-4 and the learning rate of the adapter from 2e-4 to to 4e-4. Since I can only train on one GPU and my total batch size is 8, while the configuration you provided runs on two GPUs with a total batch size of 4, I adjusted the learning rate to twice the original size according to the linear scaling rule.

  3. I changed the parameters related to printing the training process in the workflow, but the end_epoch remains at 60, which does not affect the final training result.

All other training parameters remain consistent with the original ones. The complete training log is provided here.

I wonder if my hyperparameter settings are causing the poor final performance. Could you please give me some suggestions and opinions?

thanks

First, adjusting the batch size would significantly affect the detection performance on the THUMOS dataset. Although a larger batch size means better GPU utilization, the detector (i.e., ActionFormer) prefers a smaller batch size on this dataset.

Second, the default setting is that the total batch size is 2, and there is 1 sample per GPU (if using 2 GPUs). I think you should try to use the released config to train the model, whose result should be close to 69%.
You can try the same config and run on 1 or 2 GPUs (with a total batch size of 2).

Thank you for your suggestion, it really worked.