Slow Training on Kinetics700
Closed this issue · 10 comments
Hello, I'm fine-tuning from the checkpoint MobileNetV2 W1.0 to kinetics 700 (I checked the JSON format to be as similar as possible to yours).
However, the model seems to not really learn much with pretty terrible accuracy, as you see here:
epoch loss prec1 prec5 lr
1 6.704854488372803 0.16046573221683502 0.6605217456817627 0.1
2 6.569755554199219 0.20524686574935913 1.0113072395324707 0.1
3 6.504281520843506 0.27615031599998474 1.2576035261154175 0.1
4 6.451953887939453 0.2798820734024048 1.3620928525924683 0.1
5 6.399944305419922 0.33959028124809265 1.6904878616333008 0.1
6 6.353832244873047 0.3806396424770355 2.1233720779418945 0.1
7 6.317000389099121 0.5261783003807068 2.22039794921875 0.1
What I expect usually is for there to be some significant meaningful change from epoch 1 to 10. However, I'm not really sure if my deduction is correct. For fine-tuning, I used the exact same code provided in the repo:
--dataset kinetics \
--n_classes 600 \
--n_finetune_classes 700 \
--ft_portion last_layer \
--model mobilenetv2 \
--groups 3 \
--lr_steps 20\
--width_mult 1 \
--train_crop random \
--learning_rate 0.1 \
--sample_duration 16 \
--downsample 1 \
--batch_size 64 \
--n_threads 32 \
--checkpoint 5 \
--n_val_samples 1 \
--n_epochs 20 \
I'm trying to just fit the last layer but will try with full as well however I doubt it will make much difference.
There is definetely something wrong with the training. The acc should be very high even at the end of first epoch. The training configuration also seems right (maybe you can remove --lr_steps since it expects a list, instead modify it at opts file).
Could you please quickly try finetuning UCF dataset with the same configuration? If UCF can be successfully trained, maybe there is an issue with the dataloader of Kinetics-700.
Now I notice, you need to add the "--pretrain_path" for finetuning.
I am using a pre-trained model. Sorry for not sharing the full .sh file.
--pretrain_path models/Efficient-3DCNNs/param_checkpoints/pre_trained_orig/kinetics_mobilenetv2_1.0x_RGB_16_best.pth \
It is the same one as the one in your linked google drive.
Do you think it's because class label numbers have changed in the 700 version? I was able to get up to 7% accuracy after 80 epochs (top-1) but I think that's just too much time for little gains.
epoch loss prec1 prec5
1 4.213503360748291 4.361617565155029 17.525772094726562
2 3.9561312198638916 6.238434791564941 25.66745948791504
3 3.8913986682891846 9.146180152893066 27.570709228515625
4 3.9742844104766846 7.4015326499938965 28.099390029907227
5 3.593594789505005 12.661908149719238 35.81813049316406
6 3.5310299396514893 14.644461631774902 38.091461181640625
7 3.3855011463165283 17.340734481811523 43.906951904296875
8 3.145859956741333 18.53026580810547 50.85910415649414
9 3.0501694679260254 21.64948272705078 54.37483215332031
10 2.9712729454040527 22.9711856842041 54.401268005371094
11 3.1184866428375244 19.878402709960938 52.233673095703125
12 2.95373272895813 23.39413070678711 56.5688591003418
13 3.129051685333252 21.51731300354004 55.3528938293457
14 2.8719587326049805 26.037534713745117 57.494049072265625
15 2.9718973636627197 24.47792625427246 55.00925064086914
UCF101 on 15 epochs. Is this also abnormal?
It seems normal on UCF. Maybe you can try training Kinetics-700 from scratch. After 3-4 epochs, top1 acc should be above 10%.
Thank you for your response. Another question, after training ShuffleNet v1 on UCF-101, I get the following final validation accuracy after ~52 epochs:
52 1.917535662651062 50.88554000854492 79.32857513427734
And this is my training set accuracy:
52 1.3897559642791748 62.9443244934082 87.784423828125 0.0010000000000000002
From what I understand from your paper the top-1 accuracy on UCF-101 should be 84.96? I don't know why I'm getting 20 percent lower. I trained for 25 more epochs to no avail.
Footnote:
In shufflenet.py I changed nn.Dropout(0.2) to nn.Dropout(0.9) based on your paper (Sec 3.2)
Here is my config for fine-tuning:
python ../main.py --root_path /projects\
--video_path datasets/ucf101/UCF-101 \
--annotation_path models/Efficient-3DCNNs/annotation_UCF101/ucf101_01.json \
--result_path models/Efficient-3DCNNs/results_new/v1_ucf101_shufflenet/ \
--**pretrain_path** models/Efficient-3DCNNs/param_checkpoints/pre_trained_orig/kinetics_shufflenet_2.0x_G3_RGB_16_best.pth \
--dataset ucf101 \
--n_classes 600 \
--n_finetune_classes 101 \
--ft_portion last_layer \
--model shufflenet \
--groups 3 \
--width_mult 2.0 \
--train_crop random \
--learning_rate 0.1 \
--sample_duration 16 \
--downsample 1 \
--batch_size 64 \
--n_threads 16 \
--checkpoint 1 \
--n_val_samples 1 \
--n_epochs 60 \
You are seeing clip accuracy there. You need to calculate video accuracy once you finish training.
0.5527359238699445
This is the top-1 accuracy after running it in test mode and running the commented ucf101 code in video_accuracy.py
At weekend I trained shufflenetv1_2.0x on UCF101 from pretrained Kinetics model without changing any code from the repo. I achieved 84.9 video acc. Please check your code again ore reclone the preject.