js05212/PyTorch-for-NPN

MNIST classification

janisgp opened this issue · 1 comments

I am curious how the classification setting works. You mention in your paper that you use the cross entropy loss.

Do you use as final layer a softmax? How do you propagate the variance through the softmax?

Hi,

Thanks for the interest and good question! For MNIST classification, we use elementwise sigmoid followed by cross entropy. The output mean of the sigmoid will take the mean and variance from the previous layer (pre-activation linear layer) as input. This is how both the mean and variance can affect the final prediction.

There has also been follow-ups of NPN (e.g., work from ICLR 2018 if I remember correctly) trying to extend it with softmax layer.

Hao