RaviSoji/plda

How to get scores from PLDA

Closed this issue · 8 comments

@RaviSoji How could I get the score which the PLDA assigns to a test input vector?

Right now I am using calc_logp_pp_categories for getting scores, but they are log probabilities and not raw scores.

Thanks for writing!

By scores, do you mean the latent features? If so, check out cell 28 in the MNIST demo. I think the following line is what you are looking for:

U_model = classifier.model.transform(training_data, from_space='D', to_space='U_model')

Let me know whether this is or isn't what you are looking for, and then we can go from there.

Yes, I am obtaining U_model in that way, and then I am calculating the log probabilities using U_model as input. Does it make sense to you?

I am also very curious about the normalisation that you apply in calc_logp_pp_categories. Does it make the normalisation using only the training data?

@RaviSoji

Just to put you in context, I am using this classifier for detecting spoofing attacks. I need to get the scores of genuine utterances and spoofing attacks utterances.

This is how I am using it:

z = np.concatenate((z_genuine, z_spoof))
y = np.concatenate((Y_genuine, Y_spoof))
U_model = clf.model.transform(z, from_space='D', to_space='U_model')
scores, K = clf.calc_logp_pp_categories(U_model, False)
print(getEER(scores, y))

It looks like by scores, you mean the unnormalized log densities of test data being generated by each category in the training set. In that case, the following line that you wrote looks correct: scores, K = clf.calc_logp_pp_categories(U_model, False).

If you normalize the log densities, for each test datum, you can think of the normalization being done by "(1) exponentiating the log density under each category, (2) summing up those densities, and (3) then dividing all of the densities for that datum by this total density". Of course, for numerically stability, this is done in log space.

Let me know if that isn't clear!

I just took another look at this and realized that I may still be interpreting the question incorrectly.
If all you want to do is classify new data as "spoof" or "not spoof" and obtain the accompanying probabilities, use the predict() method in the classifier, and set normalize_logpps to True:

predict(data, space='D', normalize_logps=True)

This will save you the effort of having to transform the data manually and it will return log probabilities.

The equations are actually written in the docstrings, so I am going to close this issue for now, but feel free to write back if you still need help!

Yes, all makes sense to me now. Thanx for your prompt responses and for this library!

You're welcome, and good luck!