taoyang1122/adapt-image-models

Can't reproduce Something-Somethingv2 training and results

Opened this issue · 7 comments

or7r commented

Hello!

I want to reproduce your model training and results on the Something-Somethingv2 dataset, but failed. I have used your sthv2 config file, vitclip_base_sthv2.py
and the command:
bash tools/dist_train.sh <PATH/TO/CONFIG> <NUM_GPU> --test-last --validate --cfg-options work_dir=<PATH/TO/OUTPUT>

I followed the data preparation procedure of MMAction2 described here as referred by you.

Environment info:
python 3.8.16, pytorch 1.10.0, torchvision 0.11.0, cudatoolkit 11.3.1, mmcv 1.4.0

I have used 8 NVIDIA A100 40GB GPUs in my reproducing trial.

I attach the last lines of the log file:

2023-06-11 10:18:25,528 - mmaction - INFO - Epoch [50][2560/2640]	lr: 2.960e-07, eta: 0:02:06, time: 1.552, data_time: 0.008, memory: 10710, loss_cls: 2.8023, loss: 2.8023
2023-06-11 10:18:57,682 - mmaction - INFO - Epoch [50][2580/2640]	lr: 2.960e-07, eta: 0:01:35, time: 1.608, data_time: 0.009, memory: 10710, loss_cls: 2.6813, loss: 2.6813
2023-06-11 10:19:28,511 - mmaction - INFO - Epoch [50][2600/2640]	lr: 2.960e-07, eta: 0:01:03, time: 1.542, data_time: 0.190, memory: 10710, loss_cls: 2.8156, loss: 2.8156
2023-06-11 10:20:01,052 - mmaction - INFO - Epoch [50][2620/2640]	lr: 2.960e-07, eta: 0:00:31, time: 1.621, data_time: 0.005, memory: 10710, loss_cls: 2.7159, loss: 2.7159
2023-06-11 10:20:28,968 - mmaction - INFO - Epoch [50][2640/2640]	lr: 2.960e-07, eta: 0:00:00, time: 1.403, data_time: 0.104, memory: 10710, loss_cls: 2.6996, loss: 2.6996
2023-06-11 10:20:30,023 - mmaction - INFO - Saving checkpoint at 50 epochs
2023-06-11 10:27:48,532 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-06-11 10:27:49,280 - mmaction - INFO - 
top1_acc	0.3803
top5_acc	0.6874
2023-06-11 10:27:49,280 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-06-11 10:27:49,319 - mmaction - INFO - 
mean_acc	0.3086
2023-06-11 10:27:49,371 - mmaction - INFO - The previous best checkpoint [REDACTED]/best_top1_acc_epoch_45.pth was removed
2023-06-11 10:27:50,692 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_50.pth.
2023-06-11 10:27:50,693 - mmaction - INFO - Best top1_acc is 0.3803 at 50 epoch.
2023-06-11 10:27:50,702 - mmaction - INFO - Epoch(val) [50][3098]	top1_acc: 0.3803, top5_acc: 0.6874, mean_class_accuracy: 0.3086
2023-06-11 10:37:59,836 - mmaction - INFO - Testing results of the last checkpoint
2023-06-11 10:37:59,836 - mmaction - INFO - top1_acc: 0.3875
2023-06-11 10:37:59,836 - mmaction - INFO - top5_acc: 0.6963
2023-06-11 10:37:59,837 - mmaction - INFO - mean_class_accuracy: 0.3136

According to my understanding, the results shown in the paper for this configuration are 66.4 Top-1 accuracy and 90.5 Top-5 accuracy. Yet, as can be seen in the logs, the results obtained by me are much worse.

Am I missing something?
Please let me know. Thank you.

Hello, I had problems with that too

Hi, I didn't have the problem. Could you please make sure that CLIP is installed and the pre-trained ViT is correctly loaded? You can install CLIP by pip install git+https://github.com/openai/CLIP.git. You should reach a high performance after the first several epochs.

or7r commented

Hi,
I had previously installed clip using the specified line.

I have also tried adding the argument
--cfg-options model.backbone.pretrained=openaiclip
similar to the running example of diving48, in which I succeeded to approximately reproduce the results.

Even with the additional argument, the results on ssv2 are not close to the results presented in the paper.
From the reproduction trial using the additional argument:

2023-06-13 23:30:39,761 - mmaction - INFO - Epoch [50][2580/2640]	lr: 2.960e-07, eta: 0:01:34, time: 1.607, data_time: 0.005, memory: 10710, loss_cls: 2.7282, loss: 2.7282
2023-06-13 23:31:11,810 - mmaction - INFO - Epoch [50][2600/2640]	lr: 2.960e-07, eta: 0:01:03, time: 1.602, data_time: 0.008, memory: 10710, loss_cls: 2.7860, loss: 2.7860
2023-06-13 23:31:43,233 - mmaction - INFO - Epoch [50][2620/2640]	lr: 2.960e-07, eta: 0:00:31, time: 1.571, data_time: 0.008, memory: 10710, loss_cls: 2.7098, loss: 2.7098
2023-06-13 23:32:10,852 - mmaction - INFO - Epoch [50][2640/2640]	lr: 2.960e-07, eta: 0:00:00, time: 1.385, data_time: 0.006, memory: 10710, loss_cls: 2.7527, loss: 2.7527
2023-06-13 23:32:11,941 - mmaction - INFO - Saving checkpoint at 50 epochs
2023-06-13 23:39:29,249 - mmaction - INFO - Evaluating top_k_accuracy ...
2023-06-13 23:39:29,992 - mmaction - INFO - 
top1_acc	0.3661
top5_acc	0.6707
2023-06-13 23:39:29,993 - mmaction - INFO - Evaluating mean_class_accuracy ...
2023-06-13 23:39:30,031 - mmaction - INFO - 
mean_acc	0.2952
2023-06-13 23:39:30,351 - mmaction - INFO - The previous best checkpoint [REDACTED]/best_top1_acc_epoch_45.pth was removed
2023-06-13 23:39:31,701 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_50.pth.
2023-06-13 23:39:31,702 - mmaction - INFO - Best top1_acc is 0.3661 at 50 epoch.
2023-06-13 23:39:31,711 - mmaction - INFO - Epoch(val) [50][3098]	top1_acc: 0.3661, top5_acc: 0.6707, mean_class_accuracy: 0.2952
2023-06-13 23:49:39,738 - mmaction - INFO - Testing results of the last checkpoint
2023-06-13 23:49:39,739 - mmaction - INFO - top1_acc: 0.3722
2023-06-13 23:49:39,739 - mmaction - INFO - top5_acc: 0.6744
2023-06-13 23:49:39,739 - mmaction - INFO - mean_class_accuracy: 0.3010

Hi, I cannot tell the problems from the shared log. Could you share the full training log with me?

or7r commented

Hi,
the training was split into three, using the checkpoint option.
I am attaching the log files of the full training.
log1.log
log2.log
log3.log

Hi, I have the same issue. I also got similar results, around 30%~40% top-1 acc.

Hi, it seems that the CLIP pre-trained models are not loaded correctly. I guess I missed some arguements in the README. You need to add model.backbone.pretrained=openaiclip as given in the example script here. I will update README. Sorry for the confusion.