#Real-Time Architecture for Audio-Visual Active Speaker Detection
1, this work uses the frame provide by TalkNet, and the data prepocess part is exactly the same
2, or you can dowload our processed AVA Active speaker val dataset and our pretained FSDNet model from Baidu Net Disk
processed val dataset: https://pan.baidu.com/s/1szRpbKPFLqsmnHLSNHaeRA code: cqr8
pretrained model link: https://pan.baidu.com/s/1Vgnqe-Mnuu-qX2i7H5SAaA code: ghgw
python train.py --evaluation