cambridge-mlg/cnaps

normalization used in CNAPs

xialeiliu opened this issue · 11 comments

It is unclear which normalization is used in original CNAPs, is it CBN or MetaBN?

jfb54 commented

In the original CNAPs, during both meta-training and meta-testing, batches are normalized using the fixed running statistics (i.e. mean and variance) , scale (i.e. gamma), and offset (i.e. beta) that were learned during pre-training of the feature extractor weights (theta). This is done by freezing the feature extractor weights (theta) and then running the feature extractor in eval() mode during meta-training and meta-testing. Note that this implies that batch statistics are never actively calculated in the feature extractor during meta-training and meta-testing, instead relying on the fixed values learned during pre-training. This is neither CBN or MetaBN for the feature extractor.

As an aside, the original CNAPs does actively compute batch statistics in the set encoder. This is OK because the set-encoder only ever sees the context set (i.e. the training data for the task) and not the target set (test data for the task).

I hope that this helps. Let me know if this is not clear.

Thanks for your reply. This is super clear.

It brings me two more questions:

  1. For CNAPS, it is pretrained, does it train on whole ImageNet using conventional batch training?
  2. Does it mean for TaskNorm, feature extractor is trained from scratch?
jfb54 commented

For CNAPS (with and without TaskNorm) the feature extractor is pre-trained using conventional batch training on the training split of the Meta-Dataset version of ILSVRC_2012 that consists of 712 classes (i.e. it is not pretrained on all of ImageNet). During this pre-training, CBN is used. For convenience, a ResNet18 pre-trained in this fashion is distributed in the CNAPs repo. TaskNorm is only used during meta-training and meta-testing.

I see, it means that for TaskNorm, context is train() mode and target is eval() mode. As see here.
Also I noticed that for feature extractor: param.requires_grad = False, which means gamma and beta are always fixed and only mean and variance are updated for TaskNorm.

Does it make more sense that for TaskNorm, target set to use statistics from support set like MetaBN in stead of eval() mode?

jfb54 commented

I see, it means that for TaskNorm, context is train() mode and target is eval() mode. As see here.
Also I noticed that for feature extractor: param.requires_grad = False, which means gamma and beta are always fixed and only mean and variance are updated for TaskNorm.

This is correct.

Does it make more sense that for TaskNorm, target set to use statistics from support set like MetaBN in stead of eval() mode?

When using TaskNorm and processing the target set, the support set statistics are used and blended with the auxiliary moment (usually instance norm) from the target. See

else: # compute the pooled moments for the target
alpha = self.sigmoid(self.a * self.context_size + self.b) # compute alpha with saved context size
pooled_mean, pooled_var = self._compute_pooled_moments(x, alpha, self.context_batch_mean,
self.context_batch_var,
self._get_augment_moment_fn())
.

Thanks a lot, this clears up quite some doubts in my head.
One last thing for now, what does Transductive Batch Normalization (TBN) mean for CNAPs in Table 2?
Does it mean that for each dataset, there is a specific set of BN parameters?
What does it mean for unknown domains like Trafic Signs and COCO?

jfb54 commented

TBN in CNAPs means that the feature extractor is in train() mode all the time during meta-training and meta-testing for both the context and the target. i.e. batch statistics are always being calculated and used to normalize. So nothing is done differently for any particular dataset, it just means that the actual statistics are used to normalize every task all the time.

Feel free to ask any other questions, The details are rather complicated and it took me a long time to get everything straight in my mind.

Oh, then the difference between CBN and TBN is mainly on meta-testing phase using either eval or train mode? Is it correct?

jfb54 commented

That is correct. Both TBN and CBN are in train mode for meta-training. But for meta-testing, CBN switches to eval mode, while TBN uses train mode.

Very interesting work! And thanks for your great explanations! I learned a lot today!