Question about the initialization of theta0 in abml

Question

Question about the initialization of theta0 in abml

ruinnlll opened this issue 4 years ago · 4 comments

If I understand it right, the distribution of theta0 is not a vanilla Normal distribution as it's written in the application details in the original paper and instead it's a distribution of a normal distribution multiplied by a gamma distribution. However in your code I am confused about how is that presents as it seems that you just initialize mean and logsigma of theta0. How did you implement this part? I am struggling implementing a similar algorithm like abml and really really hope you can give some help. thx:)

Answer 1 · 2020-09-18T00:31:56.000Z

Hi VespaLan,
The AMBL method is to model the parameters of the based model by a multivariate normal distribution with diagonal covariance matrix: w ~ p(w | theta), where: theta = (mean, std). To obtain theta, we need to initialize it, and then, train. In addition, the original paper included a hyper-prior: p(theta) - a normal-gamma distribution (usually seen in statistics due to conjugate prior) to regularize theta.
In my implementation, I was lazy to include this hyper-prior. What I did was to set the L2-regularization for theta in the optimizer. In this case, the hyper-prior is a "normal-normal" distribution. If you want to include th normal-gamma hyper-prior as stated in the original ABML paper, you can add it to the meta_loss right before the meta_loss.backward().
Let me know if you still have any further concerns about the implementation.

Answer 2 · 2020-09-19T15:12:46.000Z

Many thanks to your reply! I am new to the field of meta learning and that helps a lot. Yes I still have questions about the details. Here is my intuition of the implementation of the normal-gamma distribution and I wonder if it's correct: I found that in the paper alpha and beta were not updated as theta. So perhaps I just have to randomly sample these parameters for the initialization of theta? (Or I should sample it every time theta updates which makes it serve like a weight decay function.)

Answer 3 · 2020-09-19T23:23:17.000Z

Hi @VespaLan
I am not sure if I can understand you. The parameters a and b of the hyper-prior normal-gamma (alpha and beta in your case) are hyper-parameters, and those are chosen (see Table 4 in the Appendix of the ABML paper). To initialize theta, you can either sample from the hyper-prior, or sample randomly.

Answer 4 · 2020-09-20T01:04:15.000Z

Oh I didn't notice alpha and beta is given in the appendix... I see. Now I fully understood this . Thank you so much for the quick and helpful response!