Want to grab where and whose the speech start and end

Question

Want to grab where and whose the speech start and end

Closed this issue 4 years ago · 1 comments

Hi, is it possible to extract what time (or where) the speech of each speaker start and end?
I want to extract speech of each speaker so it needs to know when the speech matched to the speakers and end.

Answer 1 · 2019-10-31T13:37:12.000Z

Hi, you can use the frame-wise confidence ('fconfm' inside SyncNetInstance.py) and set a threshold. This is the frame number, so you decide the frame index by 25 to get the time in seconds. To make datasets such as LRS and VoxCeleb, we used thresholds of 3 to 4.