About Data_loader.py, "utterance = utterance[:, :, 160]", why we use the num 160?

Question

About Data_loader.py, "utterance = utterance[:, :, 160]", why we use the num 160?

Bovey0809 opened this issue 5 years ago · 1 comments

Answer 1 · 2019-06-04T02:04:19.000Z

During inference time, for every utterance we apply a sliding
window of fixed size (lb + ub)/2 = 160 frames with 50% overlap.
We compute the d-vector for each window. The final utterance-wise
d-vector is generated by L2 normalizing the window-wise d-vectors,
then taking the element-wise averge (as shown in Figure 4). read the paper Generalized End-to-End Loss for Speaker Verification