MCG-NJU/VideoMAE

ssv2 dataset acc

an1018 opened this issue · 2 comments

I'm using some videos of ssv2 dataset, top1 acc and top2 acc are extremely low. What could be the reason for this?
This is the eval command:

OUTPUT_DIR='output_ssv2/eval_lr_5e-4_epoch_50/'
DATA_PATH='dataset/ssv2/csv'
MODEL_PATH='model_pretrained/checkpoint_finetune.pth'

OMP_NUM_THREADS=1 python3.7 -m torch.distributed.launch --nproc_per_node=8 \
    --master_port 1233 --nnodes=1  --node_rank=0 --master_addr=127.0.0.1 \
    run_class_finetuning.py \
    --model vit_base_patch16_224 \
    --data_set SSV2 \
    --nb_classes 174 \
    --data_path ${DATA_PATH} \
    --finetune ${MODEL_PATH} \
    --log_dir ${OUTPUT_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --batch_size 8 \
    --num_sample 1 \
    --input_size 224 \
    --short_side_size 224 \
    --save_ckpt_freq 10 \
    --num_frames 16 \
    --opt adamw \
    --lr 5e-4 \
    --opt_betas 0.9 0.999 \
    --weight_decay 0.05 \
    --epochs 50 \
    --dist_eval \
    --test_num_segment 2 \
    --test_num_crop 3 \
    --eval

correspoding log:

Auto resume checkpoint: 
Test:  [  0/126]  eta: 0:12:04  loss: 4.2886 (4.2886)  acc1: 0.0000 (0.0000)  acc5: 25.0000 (25.0000)  time: 5.7482  data: 5.2494  max mem: 2425
Test:  [ 10/126]  eta: 0:01:40  loss: 4.9653 (4.9090)  acc1: 0.0000 (5.6818)  acc5: 12.5000 (21.5909)  time: 0.8651  data: 0.4904  max mem: 2425
Test:  [ 20/126]  eta: 0:01:08  loss: 4.9653 (4.9404)  acc1: 0.0000 (6.5476)  acc5: 12.5000 (18.4524)  time: 0.3888  data: 0.0075  max mem: 2425
Test:  [ 30/126]  eta: 0:00:52  loss: 4.9902 (4.9069)  acc1: 0.0000 (7.2581)  acc5: 25.0000 (20.5645)  time: 0.3750  data: 0.0005  max mem: 2425
Test:  [ 40/126]  eta: 0:00:42  loss: 4.8552 (4.8680)  acc1: 0.0000 (7.9268)  acc5: 25.0000 (19.5122)  time: 0.3402  data: 0.0005  max mem: 2425
Test:  [ 50/126]  eta: 0:00:35  loss: 4.7310 (4.8424)  acc1: 0.0000 (7.8431)  acc5: 12.5000 (21.3235)  time: 0.3252  data: 0.0004  max mem: 2425
Test:  [ 60/126]  eta: 0:00:28  loss: 4.8939 (4.8792)  acc1: 0.0000 (7.3770)  acc5: 12.5000 (20.4918)  time: 0.3202  data: 0.0005  max mem: 2425
Test:  [ 70/126]  eta: 0:00:23  loss: 4.8939 (4.8704)  acc1: 0.0000 (7.0423)  acc5: 12.5000 (20.9507)  time: 0.3240  data: 0.0005  max mem: 2425
Test:  [ 80/126]  eta: 0:00:18  loss: 4.9370 (4.8875)  acc1: 0.0000 (7.0988)  acc5: 12.5000 (20.3704)  time: 0.3198  data: 0.0003  max mem: 2425
Test:  [ 90/126]  eta: 0:00:14  loss: 4.9902 (4.8779)  acc1: 0.0000 (7.4176)  acc5: 25.0000 (20.7418)  time: 0.3080  data: 0.0002  max mem: 2425
Test:  [100/126]  eta: 0:00:10  loss: 4.8552 (4.8711)  acc1: 0.0000 (7.5495)  acc5: 25.0000 (20.5446)  time: 0.3047  data: 0.0002  max mem: 2425
Test:  [110/126]  eta: 0:00:06  loss: 4.7596 (4.8756)  acc1: 0.0000 (7.0946)  acc5: 12.5000 (20.3829)  time: 0.3058  data: 0.0002  max mem: 2425
Test:  [120/126]  eta: 0:00:02  loss: 4.8679 (4.8649)  acc1: 0.0000 (7.5413)  acc5: 12.5000 (20.8678)  time: 0.3048  data: 0.0002  max mem: 2425
Test:  [125/126]  eta: 0:00:00  loss: 4.8939 (4.8648)  acc1: 0.0000 (7.3413)  acc5: 12.5000 (20.8333)  time: 0.3049  data: 0.0002  max mem: 2425
Test: Total time: 0:00:47 (0.3735 s / it)
* Acc@1 8.408 Acc@5 21.156 loss 4.799
Start merging results...
Reading individual output files
Computing final results
1344
Accuracy of the network on the 8064 test videos: Top-1: 9.30%, Top-5: 21.80%

Hi @an1018! We hope to be of help to you by sharing our annotation files (train.csv, val.csv, test.csv) via this link.

@an1018 请问您现在解决了这个问题吗,我在想是不是调小了batch_size,需要调整学习率呢