NVIDIA/NeMo

Using MSDD model with a different speaker embedding model

MahmoudAshraf97 opened this issue · 2 comments

Hello, is it possible to replace the TitaNet embedding model that is used along with MSDD? and if yes, does that require retraining?
I want to construct a pipeline with a different VAD and embedding model but still use the MSDD model

It would still work to a certain degree without re-training, but you will get much better result if you retrain a new model.
Not only MSDD model, optimizing the clustering algorithm on the scale length and scale weights on the new embedding would also affect a lot on the performance.