How to apply several SGD steps within the ineer loop?

Question

How to apply several SGD steps within the ineer loop?

Opened this issue 5 years ago · 13 comments

Hi @mari-linhares , thanks for the repo!
We are building on your code to implement a bit more general version of MAML that includes a batch of tasks within the inner loop and several steps of gradient descent wrt the parameters of each task. However, we are stuck in how to add several steps of SGD within your code using tensorflow 2.0. Do you have any idea of how to do that?

Answer 1 · 2019-12-02T01:31:09.000Z

I've also been trying to build off this repo, but have encountered the same issue. It seems that updating the weights manually as done here makes them non-trainable. @davidjimenezphd Have you found a workaround? Without multiple inner loop SGD steps, this repo doesn't actually run the full version of MAML.

Answer 2 · 2019-12-02T07:30:49.000Z

Hi Alekxos. Yes we find a solution based on "watch"-ing some variables in the gradient tape. Give me some time, and I'll try to upload the solution.

Answer 3 · 2019-12-02T10:19:15.000Z

It's definitely a bug in Tensorflow. We worked around it by doing following:

build copy of meta-network and train a step (inner training), now the weights of the copy are not trainable (N=1)
Here is where the patch begins:
make a new copy of meta model and initialize the new copy
manually set the weights (you need to manually iterate through the layers) of the new copy with the weights from the trained copy, now you can use this copy and train it again (N>1). You need to repeat this for every training step...

This is a bit hacky and needs some extra calculation (for copying and forwarding through the net, but Tensorflow has so many open issues, that we use this as long the bug exists ;-) - and I think it will be there for a while..)

See our Tensorflow issue: tensorflow/tensorflow#34335

Answer 4 · 2020-02-21T18:04:07.000Z

Hi @shufflebyte

This is actually not a tensorflow bug.

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:

                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

Answer 5 · 2020-02-24T09:08:16.000Z

Hi @llan-ml
I have been stuck in this issue some while.
Surely we can update the parameters of copied model manually, but if we need to add several steps of SGD in our inner loop to update the copied model serveral times, we need to compute the gradients on the copied model. But there is no trainable variables in the copied model, GradientTape cannot compute the gradients.

Actually i tried to directly apply a tf.keras.optimizers.SGD() for updating the fast weights, this can keep the variables in the copied model trainable.

Answer 6 · 2020-02-24T09:21:06.000Z

Hi @davidjimenezphd

Have you found out how to add batch and serveral SGD steps?

I have ben stuck in this problem some days, I tried to use two tapes to watch the whole batch
process, and use stop_recording() function during the batch process to control it. It seems i can add
several SGD steps to update fast weights several times. But i failed to compute the gradients of the
whole batch, it returns a list of None. Could you please tell me how do you solve this problem.

Answer 7 · 2020-02-24T10:12:12.000Z

Hi @HilbertXu

In the case of multiple inner gradient steps, you need to manually watch the weight tensors (they are already not tf.Variable) and the tape could compute their gradients.

Answer 8 · 2020-02-24T10:38:46.000Z

Hi llan-ml

Thanks for your help, i will try it later

Answer 9 · 2020-02-24T13:31:13.000Z

Hi @HilbertXu

I wrote a toy MAML-like script, which may be helpful for you. Please let me know if you find that the implementation is correct and works in more practical situations.

Answer 10 · 2020-02-24T14:24:47.000Z

Hi @llan-ml

It shows that i dont have the access to your files. Could u please help me with this?

Maybe we can chat on wechat or email? My ss server has been blocked so it's hard for me to
access to the colab.

Answer 11 · 2020-02-24T15:35:21.000Z

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

Answer 12 · 2020-07-15T04:40:24.000Z

Hi @shufflebyte

This is actually not a tensorflow bug.
def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model
In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:
                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

But I also have error, why model.get_weight() return empty list.

        with tf.GradientTape() as support_tape:
            support_tape.watch(model.trainable_variables)
            y_pred = model.forward(x1[i])
            support_loss = compute_loss(y1, y_pred)

        gradients = support_tape.gradient(support_loss, model.trainable_variables)
        # inner_optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        k = 0
        for j in range(len(model.layers)):
            model.layers[j].kernel = tf.subtract(model.layers[j].kernel, tf.multiply(lr_inner, gradients[k]))
            model.layers[j].bias = tf.subtract(model.layers[j].bias, tf.multiply(lr_inner, gradients[k + 1]))
            k += 2
        print(model.get_weights())`

Answer 13 · 2020-07-15T05:17:21.000Z

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

with tf.GradientTape() as outer_tape:
  copied_model = model
  for _ in range(2):
    with tf.GradientTape(watch_accessed_variables=False) as inner_tape:
      inner_tape.watch(copied_model.inner_weights)
      inner_loss = compute_loss(copied_model, x, y)
    inner_grads = inner_tape.gradient(inner_loss, copied_model.inner_weights)
    # print(inner_grads)
    # print("================")
    copied_model = MetaModel.copy_from(copied_model, inner_grads)
  outer_loss = compute_loss(copied_model, x, y)
outer_grads = outer_tape.gradient(outer_loss, model.inner_weights)
optimizer.apply_gradients(zip(outer_grads, model.inner_weights))

And I try your code, the model and copied_model is the same object.When you update copied_model,it also update model.