question about voice to face matching evaluation

Question

question about voice to face matching evaluation

audio-visual opened this issue 2 years ago · 1 comments

May I ask what method is used to compare the description "Our results are given by replacing the probe voice embeddings by the embeddings of the generated face." in the paper?
Is it to firstly compute three feature embeddings using the fixed face embedding network (1: generated face embeddings, 2: true speaker face embeddings and 3: imposter face embeddings), and then compare the cosine similarity of the embeddings between 1&2, 1&3 ?

ydwen commented 2 years ago

Exactly!