
Real-time Architecture for Audio-Visual Active Speaker Detection

Primary LanguagePython

#Real-Time Architecture for Audio-Visual Active Speaker Detection

prepare Data

1,	this work uses the frame provide by TalkNet, and the data prepocess part is exactly the same
2,	or you can dowload our processed AVA Active speaker val dataset and our pretained FSDNet model from Baidu Net Disk
	processed val dataset: https://pan.baidu.com/s/1szRpbKPFLqsmnHLSNHaeRA code: cqr8
	pretrained model link: https://pan.baidu.com/s/1Vgnqe-Mnuu-qX2i7H5SAaA code: ghgw 


python train.py --evaluation