xu-ji/information-bottleneck

MI between the model and training dataset

Opened this issue · 0 comments

First of all, thank you for sharing such an influential work with the public.
The findings presented in your work might represent theoretical grounds for my empirical results. Thus, I have a couple of questions mainly concerning the calculation of MI between the model and the training dataset:
image

if I am not mistaken this quantity is computed in the "compute_MI_theta_D_single_seed_jensen" function found in the following file

image

The 'data_instances' argument in the screenshot above is only used as 'list(range(5))'. Does that mean that you are using 1 copy of the 'swag' model to compute the first term in the equation above: $$\log p(w^j|s)$$. The value would be stored in the 'log_posterior' variable

and 4 copies of the same model to estimate $$(\frac{1}{|D|} \sum_{s^{'} \in D} \log p(w^j| s^{'})$$. The value would be stored in the '
log_prior' variable.

if the models represented by the data instances are not the same, would you please highlight the difference and indicate the part of the code where this difference is implemented.

Thanks a lot in advance.