Harry24k/bayesian-neural-network-pytorch

Uncertainty estimation

Closed this issue · 4 comments

Hi,

I am interested in using BNN for uncertainty estimation instead of just expecting a random output every time. Do you have a demo showing how the uncertainty can be computed? With your code, would it possibly be like using dropout and doing inference multiple times, then computing the variance of the multiple outputs?

Thanks

Hi @yunshengtian

If you see the regression example, several "predictions" are generated for the same model after training. As you mentioned, one solution would be computing the variance for a set of K-inferences, with a large enough K you will get an estimate of the uncertainty. This is no different to performing Monte Carlo sampling, and using those samples to obtain a mean+stdv for your population of predictions.

You need to test your K, a larger one will be too expensive computationally speaking. A K too small can lead to biased estimations. Of course, it can be model/dataset specific but most of the references I have read use K: 15 to 25. In my experiments, you start to get convergence in the uncertainty estimation with K~9 or higher.

I would recommend -if possible- plotting the uncertainty vs your predictive variables (or a reduced set of it). You could have some model generating heavily skewed uncertainty estimations, and maybe that's is expected to happen with your dataset.

Best wishes

Thanks @cappelletto for your detailed explanation! This is really helpful.

Before closing this issue, I am interested in hearing your thoughts about an alternative option: Instead of inferencing K times with a "full" Bayesian network, would it be a good idea only to make the last layer Bayesian (the last layer is bnn.BayesLinear and all layers in front of that are nn.Linear). In this case, we can analytically compute the output variance from the posterior sigma of the last layer. Have you tried this before, and if so, would it perform similarly to a full Bayesian network?

Much appreciated!

Dear @yunshengtian

If I understand correctly, what you propose is a linear model with a single Bayesian layer at the very output. I assume the rationale behind that is to be able to compute the output variance analytically. Is that right? I can understand your motivation (i.e. being able to directly calculate the model variance using the uncertainty of your last layer). Even better, you could move your Bayesian layer deeper into your stack, and still being able to propagate the uncertainty from any hidden layer to the output one. This is easy to do if you have a linear model half (starting from the Bayesian one to the output layer). Now, this brings me to your last question: does it behave like a full Bayesian network? I have to admit I do not feel fully confident to give you an absolute answer. However, I can share some of the information I have found so far in published works and experiments I have done:

  • To obtain a Bayesian model you do not need to convert all your layers into Bayesian. (I do not have the names of the references, but this is a quit common assumption which allows you to reduce computation costs by only having one Bayesian layer. It is related to model decomposition)
  • If the distribution of your target/predictions is expected to follow a Gaussian distribution, then you will be Ok with using a Gaussian as your last layer. Any deviation from this always-overlooked assumption will have a hit on the both the generalization capabilities and training performance of your model. Any non-Normal distribution of your data will require either a distribution transformation or allowing your model to learn non-Gaussian distributions (e.g. adding non-linear layer after the Bayesian layer)
  • Is there any specific reason to require an analytical solution to you uncertainty calculation? In Bayesian learning, the model parameters are modeled as random variables, and their estimation can be too expensive or intractable. You can use approximate methods as Markov Chain MC, pure Monte Carlo or Variational Inference (ELBO). Regardless the method, you will obtain a posterior distribution of your model parameters and with them, an estimate of your prediction uncertainty.

I have used small - mid sized BNN with only one Bayesian layer (hidden), and few non-linear ones with results as good as going crazy with multiple Bayesian layers, while saving training/prediction time.

I hope this bit of info helps you to find what you are looking for.

Cheers,
Jose

Hi Jose,

You understand my question correctly, and thank you for kindly sharing the information! This is absolutely helpful!

Thanks,
Yunsheng