Questions about the implementation for inner loop
Closed this issue · 3 comments
hytseng0509 commented
Hi,
Thanks for sharing the code. I have questions about the implementation for inner loop:
Is there any reason for the special case of i == 0
? Can we just use fast_weights
for i == 0
?
Thanks!
katerakelly commented
Hi,
It's been a while since I wrote this, but I believe the reason has to do
with computing the meta-gradient, here
<https://github.com/katerakelly/pytorch-maml/blob/f4d36caa4cdd43d28332befaf88b4df8551c0f34/src/inner_loop.py#L76>
.
I think if you use fast_weights for i == 0, then PyTorch doesn't have the
model parameters in the graph it constructed during the train updates, and
so it's not able to compute the gradient w.r.t. them. Essentially, I just
needed to retain a pointer to the original model parameters, but maybe that
wasn't the most intuitive way to do it.
…On Tue, Sep 18, 2018 at 4:59 PM Hung-Yu Tseng ***@***.***> wrote:
Hi,
Thanks for sharing the code. I have questions about the implementation for inner
loop
<https://github.com/katerakelly/pytorch-maml/blob/f4d36caa4cdd43d28332befaf88b4df8551c0f34/src/inner_loop.py#L57>
:
Is there any reason for the special case of i == 0? Can we just use
fast_weights for i == 0?
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#11>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ADplVlXr-b2Y2yGrlCJmpzYBj4GY-rU8ks5ucYj8gaJpZM4WvLQv>
.
--
Kate Rakelly
UC Berkeley EECS PhD Student
rakelly@eecs.berkeley.edu
hytseng0509 commented
Thanks for the explanation!
frajem commented
Hi,
sorry to reopen this discussion. As you define the fast_weights to be same as the original weights, in here (
pytorch-maml/src/inner_loop.py
Line 53 in f4d36ca
fast_weights = OrderedDict((name, param) for (name, param) in self.named_parameters())
for i in range(self.num_updates):
then there should already be a "pointer" to the original model parameters for computing the meta gradient. Thus, there should be no need to distinguish i==0. Do you think my observation is correct?
Thanks.