HarryVolek/PyTorch_Speaker_Verification

About Data_loader.py, "utterance = utterance[:, :, 160]", why we use the num 160?

Bovey0809 opened this issue · 1 comments

About Data_loader.py, "utterance = utterance[:, :, 160]", why we use the num 160?

During inference time, for every utterance we apply a sliding
window of fixed size (lb + ub)/2 = 160 frames with 50% overlap.
We compute the d-vector for each window. The final utterance-wise
d-vector is generated by L2 normalizing the window-wise d-vectors,
then taking the element-wise averge (as shown in Figure 4). read the paper Generalized End-to-End Loss for Speaker Verification