The result of inference cannot be found in the paper

Question

The result of inference cannot be found in the paper

fake-warrior8 opened this issue 2 years ago · 5 comments

Hi, I reran the inference code: sh run_scripts/sevila/inference/nextqa_infer.sh, the result is {'agg_metrics': 0.649119295436349, 'total': 4996, 'DC': 60.32608695652174, 'CH': 64.62809917355372, 'CW': 63.91959798994975, 'TN': 57.279236276849645, 'TC': 65.26458616010855, 'DL': 89.84962406015038, 'DO': 73.74631268436578, 'TP': 51.35135135135135}. However, I cannot find acc 64.9 in your paper, so what setting does the results of nextqa_infer.sh used and why I cannot find an acc of 64.9 in the paper?

I used the NExT-QA videos and annotation from the original author's github, the preprocessing code you given, and the checkpoint you given.

Answer 1 · 2023-06-02T05:14:22.000Z

Hello, the GPU that we utilize for our model is the A6000, what type of GPU you used? The model used in the script is built with a zero-shot answerer and a pre-trained localizer (corresponding to paper Table 2 NeXT-QA: 63.6%).

Answer 2 · 2023-06-02T06:04:34.000Z

Hello, the GPU that we utilize for our model is the A6000, what type of GPU you used? The model used in the script is built with a zero-shot answerer and a pre-trained localizer (corresponding to paper Table 2 NeXT-QA: 63.6%).

I used A100. Thank you for your reply.

Answer 3 · 2023-06-04T02:55:36.000Z

Hello, the GPU that we utilize for our model is the A6000, what type of GPU you used? The model used in the script is built with a zero-shot answerer and a pre-trained localizer (corresponding to paper Table 2 NeXT-QA: 63.6%).

I reran the fine-tuning code on NExT-QA, the result is {'agg_metrics': 0.8364691753402722, 'total': 4996, 'DC': 75.0, 'CW': 84.92462311557789, 'TN': 78.52028639618138, 'DL': 97.36842105263158, 'CH': 81.81818181818183, 'TC': 81.9538670284939, 'TP': 78.37837837837837, 'DO': 90.2654867256637}, which is much higher than the acc 73.8 reported in your paper..

I used the following script

python -m torch.distributed.run --nproc_per_node=4 --master_port 29503 train.py \
--cfg-path ./lavis/projects/sevila/train/nextqa.yaml \
--options run.output_dir=${result_dir}${exp_name} \
model.frame_num=4 \
datasets.nextqa.build_info.annotations.train.storage=${train_path} \
datasets.nextqa.build_info.annotations.val.storage=${val_path} \
datasets.nextqa.build_info.annotations.test.storage=${val_path} \
datasets.nextqa.build_info.videos.storage=${video_path} \
datasets.nextqa.vis_processor.train.n_frms=32 \
datasets.nextqa.vis_processor.eval.n_frms=32 \
run.batch_size_train=8 \
run.batch_size_eval=8 \
run.init_lr=3e-5 \
run.max_epoch=10 \
run.warmup_steps=1000 \
run.accum_grad_iters=2 \
model.task='qvh_freeze_loc_train_qa_with_loc_train_qa_vid' \
model.finetuned=${ckpt} \
run.task='videoqa'

The train/val meta file are processed by your script, the checkpoints used the one you given, the video are downloaded from NExT-QA .

Answer 4 · 2023-06-04T04:44:20.000Z

Oh, I just checked the data preprocessing scripts and found the bug.
The NeXT-QA val is actually the NeXT-QA train in previous scripts. I have fixed this, and you may double-check it by confirming NeXT-QA val contains 4996 examples.
sorry for this bug, and thanks for pointing it out.
Could you please re-try it with the correct NeXT-QA val with the zero-shot setting? it should be similar to what our paper reported.

Answer 5 · 2023-06-04T05:44:17.000Z

As a note, if you want to re-test the fine-tuning model, you should combine the saved checkpoints with missing keys in released checkpoints, since the LAVIS framework does not save the frozen model parts.