Question about the initialization of theta0 in abml
ruinnlll opened this issue · 4 comments
If I understand it right, the distribution of theta0 is not a vanilla Normal distribution as it's written in the application details in the original paper and instead it's a distribution of a normal distribution multiplied by a gamma distribution. However in your code I am confused about how is that presents as it seems that you just initialize mean and logsigma of theta0. How did you implement this part? I am struggling implementing a similar algorithm like abml and really really hope you can give some help. thx:)
Hi VespaLan,
The AMBL method is to model the parameters of the based model by a multivariate normal distribution with diagonal covariance matrix: w ~ p(w | theta)
, where: theta = (mean, std)
. To obtain theta
, we need to initialize it, and then, train. In addition, the original paper included a hyper-prior: p(theta)
- a normal-gamma distribution (usually seen in statistics due to conjugate prior) to regularize theta
.
In my implementation, I was lazy to include this hyper-prior. What I did was to set the L2-regularization for theta in the optimizer. In this case, the hyper-prior is a "normal-normal" distribution. If you want to include th normal-gamma hyper-prior as stated in the original ABML paper, you can add it to the meta_loss
right before the meta_loss.backward()
.
Let me know if you still have any further concerns about the implementation.
Many thanks to your reply! I am new to the field of meta learning and that helps a lot. Yes I still have questions about the details. Here is my intuition of the implementation of the normal-gamma distribution and I wonder if it's correct: I found that in the paper alpha
and beta
were not updated as theta
. So perhaps I just have to randomly sample these parameters for the initialization of theta
? (Or I should sample it every time theta
updates which makes it serve like a weight decay function.)
Hi @VespaLan
I am not sure if I can understand you. The parameters a
and b
of the hyper-prior normal-gamma (alpha
and beta
in your case) are hyper-parameters, and those are chosen (see Table 4 in the Appendix of the ABML paper). To initialize theta
, you can either sample from the hyper-prior, or sample randomly.
Oh I didn't notice alpha
and beta
is given in the appendix... I see. Now I fully understood this . Thank you so much for the quick and helpful response!