Want to grab where and whose the speech start and end
Closed this issue · 1 comments
roshideen commented
Hi, is it possible to extract what time (or where) the speech of each speaker start and end?
I want to extract speech of each speaker so it needs to know when the speech matched to the speakers and end.
joonson commented
Hi, you can use the frame-wise confidence ('fconfm' inside SyncNetInstance.py) and set a threshold. This is the frame number, so you decide the frame index by 25 to get the time in seconds. To make datasets such as LRS and VoxCeleb, we used thresholds of 3 to 4.