yaoyao-liu/meta-transfer-learning

a question about meta-training strategy

Sword-keeper opened this issue · 15 comments

Hi, when i read your code. i noticed that your meta-training strategy have some differences with MAML. Could you tell me which meta-learning paper design this strategy ? Or it is your design? Besides, what's the reason you choose this strategy?

Hi,

What do you mean by “training strategy”? Do you mean that we introduce “pre-training” phase?

Best,
Yaoyao

I mean meta training phase. In maml's outer loop, the loss which update model's params is all tasks'(100 training task) loss' sum. In each outer loop epoch,model's param update only once . however ,in your torch version, in the outer loop phase, the loss which update model's params is every task's loss. in each outer loop epoch, it update 100 times(training task num). This pic may can explain more clearly.

image

I think you misunderstand MAML.

MAML doesn't use all tasks' losses to update the model in the outer loop. Our MTL uses a similar meta-training strategy as MAML. Your figure doesn't show the correct strategy applied in MAML.

In MAML, they use the "meta-batch" strategy, i.e., using the average loss of 4 tasks to update one outer loop iteration. In our method, we just set the number of "meta-batch" to 1.

oh i see. thank you every much. And could you tell me why you set the meta-batch to 1? what's the meaning of meta-batch?

If the meta-batch size is 4, in one outer loop iteration, the model will be updated by the average loss of 4 different tasks.
I set the meta-batch size to 1 because it will be easier to implement it...

well... thank you~

No problem.

I think your figure is correct. But n is not 100. It is 4 in the different settings of MAML.

Besides, n is not the number of all tasks. In MAML, we can sample e.g., 10000 tasks. The four tasks in one meta-batch are sampled from the 10000 tasks.

oh you are right . i misunderstand this figure.

@Sword-keeper Hello, I agree with u. And also thank the authors for their useful reply.
I guess the main difference between the MTL and MAML w.r.t. “training strategy” is the setting of meta_batch_size, where MAML is 4 and MTL is 1. Besides, I guess "update 100 times" means the parameter update_batch_size ($k$ in your figure) in MAML code, which is set as 5 while MTL is 100? I'm actually also puzzled about this. (e.g., line 101 in meta-transfer-learning/pytorch/trainer/pre.py )
for _ in range(1, self.update_step):

Hi @LavieLuo,

Thanks for your interest in our work.
In MAML, they update all network parameters during base-learning 5 times.
In our MTL, we update the FC layer during base-learning 100 times.
As we update a minimal number of parameters compared to MAML, we can update them more times.

If you have any further questions, please send me an email or add comments on this issue.

Best,
Yaoyao

@yaoyao-liu Woo, thank you for this prompt reply. Now I completely understand the motivation of this strategy. That's cool! :)

@LavieLuo In my experience, if the base-learner overfits the training samples of the target task, the performance won't drop. So I just update the FC layer as many times as I can to make it overfitting.

@yaoyao-liu Yes, I agree! I remember some recent works show the overfitting of DNNs manifests in probabilistic (over-confidence) which somehow doesn‘t degrade the accuracy. Also, I forget that MTL only trains a part of the parameters, and now I figure it out. Thanks again!

@LavieLuo
No problem.