GenjiB/LAVISH

Can't get th similar accuracy of AVE reported in the paper (81.1%)

liushenme opened this issue · 7 comments

Hi,
We used the same config in this repo to train AVE task on a 3090, but the accuracy we got is 78.96.

python3 main_trans.py --Adapter_downsample=8 --audio_folder=$PATH/raw_audio --batch_size=2 --early_stop=5 --epochs=50 --is_audio_adapter_p1=1 --is_audio_adapter_p2=1 --is_audio_adapter_p3=0 --is_before_layernorm=1 --is_bn=1 --is_fusion_before=1 --is_gate=1 --is_post_layernorm=1 --is_vit_ln=0 --lr=5e-05 --lr_mlp=4e-06 --mode=train --num_conv_group=2 --num_tokens=2 --num_workers=16 --video_folder=$PATH/video_frames --is_multimodal=1 --vis_encoder_type=swin

When we use the config in run_v2.sh, the accuracy is 80.05, which is different from those reported in the paper (81.1%). Is the result within the acceptable floating range?

GenjiB commented

Hi,

Can you also use the processed data I provided? I tried on 2~3 servers for v2.sh. It can achieve 80.8-81.1

Yes, we used the processed data you provided in the repo. And the pytorch version we used is 1.13.0.

GenjiB commented

can you also try a few other random seeds? I felt like the AVE datasets are too small. It has somewhat issues on reproduce results.

The worse case I got with v2.sh.

Let me know if you still cannot reproduce similar results. I'll try to figure out how to address this issue.

image

kaiw7 commented

Hi, could you please share the training logs such as accuracy per epoch? Thank you very much.

GenjiB commented

hi @kaiw7,
here are the logs.
output.log
output_v2.log

Hi, @GenjiB , is the accuracy of your paper is based on validation set or test set? Since in your log file I only find the val acc.