Update: We have a new, more powerful generalization, T-PSDA. The new code repo is here, and a paper describing the new model is available here.
This is a Python implementation of the algorithms described in our Interspeech 2022 paper:
- Please cite this paper if you find our code useful.
Probabilistic Linear Discrimnant Analysys (PLDA) is a trainable scoring backend that can be used for things like speaker/face recognition or clustering, or speaker diarization. PLDA uses the self-conjugacy of multivariate Gaussians to obtain closed-form scoring and closed-form EM updates for learning. Some of the Gaussian assumptions of the PLDA model are violated when embeddings are length-normalized.
With PSDA, we use Von Mises-Fisher (VMF) instead of Gaussians, because they may give a better model for this kind of data. The VMF is also self-conjugate, so we enjoy the same benefits of closed-form scoring and EM-learning.
For now everything is implemented in numpy and scipy. (The EM algorithm has closed-form updates, so we don't need automatic derivatives for now). The demo code uses our PYLLR toolkit for evaluation of the accuracy and calibration.
We will neaten the installation procedure later. For now, install PYLLR and then just put the directory of this toolkit in your python path. Then run demo.py to see that it works and look at the demo code to figure out how to use the toolkit for training and scoring.