Can't get th similar accuracy of AVE reported in the paper (81.1%)
liushenme opened this issue · 7 comments
Hi,
We used the same config in this repo to train AVE task on a 3090, but the accuracy we got is 78.96.
python3 main_trans.py --Adapter_downsample=8 --audio_folder=$PATH/raw_audio --batch_size=2 --early_stop=5 --epochs=50 --is_audio_adapter_p1=1 --is_audio_adapter_p2=1 --is_audio_adapter_p3=0 --is_before_layernorm=1 --is_bn=1 --is_fusion_before=1 --is_gate=1 --is_post_layernorm=1 --is_vit_ln=0 --lr=5e-05 --lr_mlp=4e-06 --mode=train --num_conv_group=2 --num_tokens=2 --num_workers=16 --video_folder=$PATH/video_frames --is_multimodal=1 --vis_encoder_type=swin
When we use the config in run_v2.sh, the accuracy is 80.05, which is different from those reported in the paper (81.1%). Is the result within the acceptable floating range?
Hi,
Can you also use the processed data I provided? I tried on 2~3 servers for v2.sh. It can achieve 80.8-81.1
Yes, we used the processed data you provided in the repo. And the pytorch version we used is 1.13.0.
Hi, could you please share the training logs such as accuracy per epoch? Thank you very much.
hi @kaiw7,
here are the logs.
output.log
output_v2.log
Hi, @GenjiB , is the accuracy of your paper is based on validation set or test set? Since in your log file I only find the val acc.
@zhenwangrs it's the test set. same as here: https://github.com/GenjiB/LAVISH/blob/main/AVE/main_trans_v2.py#L192