Slow Training on Kinetics700

Question

Slow Training on Kinetics700

Closed this issue 5 years ago · 10 comments

Hello, I'm fine-tuning from the checkpoint MobileNetV2 W1.0 to kinetics 700 (I checked the JSON format to be as similar as possible to yours).

However, the model seems to not really learn much with pretty terrible accuracy, as you see here:

epoch	loss	prec1	prec5	lr
1	6.704854488372803	0.16046573221683502	0.6605217456817627	0.1
2	6.569755554199219	0.20524686574935913	1.0113072395324707	0.1
3	6.504281520843506	0.27615031599998474	1.2576035261154175	0.1
4	6.451953887939453	0.2798820734024048	1.3620928525924683	0.1
5	6.399944305419922	0.33959028124809265	1.6904878616333008	0.1
6	6.353832244873047	0.3806396424770355	2.1233720779418945	0.1
7	6.317000389099121	0.5261783003807068	2.22039794921875	0.1

What I expect usually is for there to be some significant meaningful change from epoch 1 to 10. However, I'm not really sure if my deduction is correct. For fine-tuning, I used the exact same code provided in the repo:

--dataset kinetics \
--n_classes 600 \
--n_finetune_classes 700 \
--ft_portion last_layer \
--model mobilenetv2 \
--groups 3 \
--lr_steps  20\
--width_mult 1 \
--train_crop random \
--learning_rate 0.1 \	
--sample_duration 16 \
--downsample 1 \
--batch_size 64 \
--n_threads 32 \
--checkpoint 5 \
--n_val_samples 1 \
--n_epochs 20 \

I'm trying to just fit the last layer but will try with full as well however I doubt it will make much difference.

Answer 1 · 2020-05-14T05:51:18.000Z

There is definetely something wrong with the training. The acc should be very high even at the end of first epoch. The training configuration also seems right (maybe you can remove --lr_steps since it expects a list, instead modify it at opts file).

Could you please quickly try finetuning UCF dataset with the same configuration? If UCF can be successfully trained, maybe there is an issue with the dataloader of Kinetics-700.

Answer 2 · 2020-05-14T08:15:19.000Z

Now I notice, you need to add the "--pretrain_path" for finetuning.

Answer 3 · 2020-05-14T13:57:32.000Z

I am using a pre-trained model. Sorry for not sharing the full .sh file.

	--pretrain_path models/Efficient-3DCNNs/param_checkpoints/pre_trained_orig/kinetics_mobilenetv2_1.0x_RGB_16_best.pth \

It is the same one as the one in your linked google drive.

Answer 4 · 2020-05-14T18:25:28.000Z

Do you think it's because class label numbers have changed in the 700 version? I was able to get up to 7% accuracy after 80 epochs (top-1) but I think that's just too much time for little gains.

Answer 5 · 2020-05-14T20:42:49.000Z

epoch	loss	prec1	prec5
1	4.213503360748291	4.361617565155029	17.525772094726562
2	3.9561312198638916	6.238434791564941	25.66745948791504
3	3.8913986682891846	9.146180152893066	27.570709228515625
4	3.9742844104766846	7.4015326499938965	28.099390029907227
5	3.593594789505005	12.661908149719238	35.81813049316406
6	3.5310299396514893	14.644461631774902	38.091461181640625
7	3.3855011463165283	17.340734481811523	43.906951904296875
8	3.145859956741333	18.53026580810547	50.85910415649414
9	3.0501694679260254	21.64948272705078	54.37483215332031
10	2.9712729454040527	22.9711856842041	54.401268005371094
11	3.1184866428375244	19.878402709960938	52.233673095703125
12	2.95373272895813	23.39413070678711	56.5688591003418
13	3.129051685333252	21.51731300354004	55.3528938293457
14	2.8719587326049805	26.037534713745117	57.494049072265625
15	2.9718973636627197	24.47792625427246	55.00925064086914

UCF101 on 15 epochs. Is this also abnormal?

Answer 6 · 2020-05-14T21:27:07.000Z

It seems normal on UCF. Maybe you can try training Kinetics-700 from scratch. After 3-4 epochs, top1 acc should be above 10%.

Answer 7 · 2020-05-16T22:39:20.000Z

Thank you for your response. Another question, after training ShuffleNet v1 on UCF-101, I get the following final validation accuracy after ~52 epochs:

52 1.917535662651062 50.88554000854492 79.32857513427734
And this is my training set accuracy:

52 1.3897559642791748 62.9443244934082 87.784423828125 0.0010000000000000002

From what I understand from your paper the top-1 accuracy on UCF-101 should be 84.96? I don't know why I'm getting 20 percent lower. I trained for 25 more epochs to no avail.

Footnote:
In shufflenet.py I changed nn.Dropout(0.2) to nn.Dropout(0.9) based on your paper (Sec 3.2)

Here is my config for fine-tuning:

python ../main.py --root_path /projects\
	--video_path datasets/ucf101/UCF-101 \
	--annotation_path models/Efficient-3DCNNs/annotation_UCF101/ucf101_01.json \
	--result_path models/Efficient-3DCNNs/results_new/v1_ucf101_shufflenet/ \
	--**pretrain_path** models/Efficient-3DCNNs/param_checkpoints/pre_trained_orig/kinetics_shufflenet_2.0x_G3_RGB_16_best.pth \
	--dataset ucf101 \
	--n_classes 600 \
	--n_finetune_classes 101 \
	--ft_portion last_layer \
	--model shufflenet \
	--groups 3 \
	--width_mult 2.0 \
	--train_crop random \
	--learning_rate 0.1 \
	--sample_duration 16 \
	--downsample 1 \
	--batch_size 64 \
	--n_threads 16 \
	--checkpoint 1 \
	--n_val_samples 1 \
        --n_epochs 60 \

Answer 8 · 2020-05-16T22:45:40.000Z

You are seeing clip accuracy there. You need to calculate video accuracy once you finish training.

Answer 9 · 2020-05-17T02:32:09.000Z

0.5527359238699445
This is the top-1 accuracy after running it in test mode and running the commented ucf101 code in video_accuracy.py

Answer 10 · 2020-05-18T11:47:08.000Z

At weekend I trained shufflenetv1_2.0x on UCF101 from pretrained Kinetics model without changing any code from the repo. I achieved 84.9 video acc. Please check your code again ore reclone the preject.