YuanGongND/cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

PythonBSD-2-Clause

Issues

Can you share some visualization for your results of paper?
#32 opened 4 months ago by DevKiHyun
2
traintest_ft.py 中缺少 calculate_stats 函数
#31 opened 7 months ago by yt605155624
1
About the video part, could you release the experimental code?
#30 opened 8 months ago by Cb1ock
2
Eval data not used in evaluation stage?
#28 opened 9 months ago by ben2002chou
0
some problem about finetuning
#27 opened 10 months ago by thirteen-bears
1
Question Regarding stat calculation of dataset
#24 opened a year ago by ben2002chou
3
BOM Considerations When Extracting Your Video frames & Audio
#25 opened a year ago by fujitte
2
Where is contrastive loss implemented? How are the positive and negative samples defined?
#23 opened a year ago by ben2002chou
2
Not found the sample_video_extract_list.csv
#22 opened a year ago by JackieWang9811
4
Could you release the checkpoints pretrained on Kinetics 400
#21 opened a year ago by qiyue-liang
1
what is the validation set for finetuning?
#19 opened a year ago by thirteen-bears
6
Question for contrastive loss weight in the paper
#20 opened a year ago by sukun1045
3
retrieval evaluation
#15 opened a year ago by sukun1045
3
installation
#18 opened a year ago by chandlerbing65nm
0
Just suggesting a small change to Loading model for Finetuning Example
#16 opened a year ago by ben2002chou
2
Some confuse about this paper and implement
#17 opened a year ago by skyzjsx
1
How can i get the video and audio pairs of audioset?
#10 opened a year ago by SteveTanggithub
6
Audio Event Classification resulting tensor has all negative values
#14 opened a year ago by rehana-mahfuz
5
Acquiring checkpoints of VGGSound (audio), VGGSound (video)
#13 opened a year ago by mouxingyang
1
Question about some irregular videos in AudioSet-20k
#9 opened a year ago by mouxingyang
6
How to download MSR-VTT datatset?
#11 opened a year ago by KyeonghaRho
4
Finetune CAVMAE on ESC50
#8 opened a year ago by kaiw7
6
Pretraining cav-mae on K400
#5 opened a year ago by kaiw7
18
Usage of audio-modality components for visual embeddings
#7 opened 2 years ago by gchochla
2
Multi-gpu pre-training
#6 opened 2 years ago by mtran14
4
Which epoch of pre-trained models should I use?
#4 opened 2 years ago by GenjiB
5
Zero-shot Code
#3 opened 2 years ago by zongzi3zz
2
Video Only results on AudioSet-20K
#2 opened 2 years ago by GenjiB
3
Error when loading the CAV-MAE model
#1 opened 2 years ago by pelegshilo
2