Questions about the implementation for inner loop

Hi,

Thanks for sharing the code. I have questions about the implementation for inner loop:
Is there any reason for the special case of i == 0? Can we just use fast_weights for i == 0?

Thanks!

Hi, It's been a while since I wrote this, but I believe the reason has to do with computing the meta-gradient, here <https://github.com/katerakelly/pytorch-maml/blob/f4d36caa4cdd43d28332befaf88b4df8551c0f34/src/inner_loop.py#L76> . I think if you use fast_weights for i == 0, then PyTorch doesn't have the model parameters in the graph it constructed during the train updates, and so it's not able to compute the gradient w.r.t. them. Essentially, I just needed to retain a pointer to the original model parameters, but maybe that wasn't the most intuitive way to do it.

…

On Tue, Sep 18, 2018 at 4:59 PM Hung-Yu Tseng ***@***.***> wrote: Hi, Thanks for sharing the code. I have questions about the implementation for inner loop <https://github.com/katerakelly/pytorch-maml/blob/f4d36caa4cdd43d28332befaf88b4df8551c0f34/src/inner_loop.py#L57> : Is there any reason for the special case of i == 0? Can we just use fast_weights for i == 0? Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#11>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADplVlXr-b2Y2yGrlCJmpzYBj4GY-rU8ks5ucYj8gaJpZM4WvLQv> .

-- Kate Rakelly UC Berkeley EECS PhD Student rakelly@eecs.berkeley.edu

Thanks for the explanation!

Hi,
sorry to reopen this discussion. As you define the fast_weights to be same as the original weights, in here (

pytorch-maml/src/inner_loop.py

Line 53 in f4d36ca

    
           fast_weights = OrderedDict((name, param) for (name, param) in self.named_parameters())

):

fast_weights = OrderedDict((name, param) for (name, param) in self.named_parameters())
for i in range(self.num_updates):

then there should already be a "pointer" to the original model parameters for computing the meta gradient. Thus, there should be no need to distinguish i==0. Do you think my observation is correct?
Thanks.