xyutao/fscil

Questions about the Table 4 in the paper and the FSCIL conditions

Closed this issue · 14 comments

Hi @xyutao,

Thanks for the great work and I have a question for the ablation study of Comparison between "exemplars" and NG nodes:

  • Is the first row Memory represents the number of the exemplars and the number of nodes you are using for knowledge representation, respectively? If yes, I am confused about how can they compare with each other?

In my opinion, as the representation types are different, the definition of the Memory for each setting may not be aligned with each other, for example, the unit storage cost of an exemplar and a node may be different, then comparing these two settings may seem to be unfair, or even they should not be comparable. Could you elaborate more on this ablation study setting, such as the motivation and the implementation details?

I really appreciate any help you can provide.

Hi @wuyujack,

As mentioned in the paper, an alternative approach is to select a set of exemplars representative of the old class samples and penalize the changing of their feature vectors. Therefore, a memory unit stores an exemplar's image sample and feature vector. Analogously, for a NG node, a memory unit mainly stores its centroid vector as well as an corresponding image sample for computing its observation.

Thus, the "memory" in Table 3 mainly refers to the stored (feature vector, image, etc.) tuples.

@wuyujack The motivation of this ablation study attempts to explore how representative of the NG for knowledge representation. As many distillation based works in class-incremental learning mainly use "exemplars" for computing the knowledge, we want to see whether using a neural gas can achieve greater representation power.

@xyutao Thank you so much for the quick reply and after double-checking the paper, I realize that I do miss the info about the $z_{j}$ in the definition of NG node, as I incorrectly categorized your paper as class incremental learning without memory replay.

Then the next two questions are:

  • for a node, is the $z_{j}$ only contains one image from $D^(1)$ whose feature vector $f$ is the nearest $m_{j}$? For example, as mentioned in Section 4. Experiment,

We learn a NG net of 400 nodes for base classes,

it means for the base classes you maintain a memory of 400 images. Is it correct?

  • And as we need to compute the $\hat{\mathbf{m}}$ in the anchor loss (AL) term in the equation 5, therefore during the training of the new class, the maintained memory (e.g. 400 images abovementioned) is also included in the training dataset for the new class learning. Is it also correct?

@wuyujack

  • Yes. z_j is a single nearest image for computing the observation \hat{m}_j.

  • No. We don't interleave them with new class training samples. In Eq.(7), the softmax cross-entropy loss is only computed with new class training samples. z_j is only used for computing \hat{m}_j.

It is not a good idea to reuse z_j for memory replay purpose. It leads to overfit to these images and sacrifies the generalization performance. We aim to consolidate the memory rather than replay them.

@xyutao I see and thanks for the detailed explanation! It does help me understand the paper better.

BTW, are you also going to release the TPCIL (ECCV-2020) in the future?

Yes. As soon as I get permission from my funder. xD

@xyutao Thanks! BTW, could you provide the MXNet version you were used for the code and also the GPU & CUDA version?

Hi @xyutao, could you also provide the gluoncv and numpy version and corresponding MXNet version?

Hi @wuyujack When doing the paper, I use a earlier mxnet version 1.3, gluoncv version 0.3, cuda version 9.0 and numpy version 1.14, with Titan Xp cards. Later releases should also work fine.

Thanks!

Hi @xyutao! Recently, I find your oral talk in the CVPR 2020 virtual portal, and in your slide, the 3rd-page "Few-Shot Class-Incremental Learning (FSCIL)", you list the "Conditions" for FSCIL as:

  • A unified classifiers;
  • The old class training set is unavailable;
  • The new class training samples are few;

However, based on our previous discussion, for an NG node, a memory unit mainly stores its centroid vector as well as a corresponding image sample for computing its observation. I think it implies that the old class training set is accessible during incremental learning though you don't interleave them with new class training samples. This makes me confused about your conditions for FSCIL.

Hi @wuyujack

The first condition inherits from the class-incremental learning setting, where a single incremental head is used for all tasks [a], rather than assigning isolated head for each task.

The second condition inherits from the common incremental/continual/lifelong learning setting. As the training set at session t will be unavailable at session (t+1), most methods (e.g., [a,b,c,d]) use a small external memory to store exemplars extracted from the training set at the end of session t.

The third condition is specially for FSCIL.

[a] Learning a unified classifier incrementally via rebalancing
[b] Large scale incremental learning
[c] Gradient episodic memory for continual learning
[d] End-to-end incremental learning

@xyutao thanks for your explanation and references. Yep by storing small external memory, this kind of method still does not have complete access or only has limited access to the old class training set through the memory. Actually, I originally thought that the condition 2 is an extremely strict constraint such that no image or corresponding representation is allowed to be stored.

BTW, may I know whether the remaining parts of the code are going to be released soon? I am also trying to replicate your methods but I think it should be better if I can have the complete version of it for reference.

We are doing a journal extension of this paper. The remaining part will be released soon.