/CIL_Survey

The code repository for "Deep Class-Incremental Learning: A Survey" in PyTorch.

Primary LanguagePython

Deep Class-Incremental Learning: A Survey

The code repository for "Deep Class-Incremental Learning: A Survey" in PyTorch. If you use any content of this repo for your work, please cite the following bib entry:

@article{zhou2023class,
    author = {Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei},
    title = {Deep Class-Incremental Learning: A Survey},
    journal = {arXiv preprint arXiv:2302.03648},
    year = {2023}
 }

Feel free to create new issues or drop me an email if you find any interesting paper missing in our survey, and we shall include them in the next version.

Updates

[02/2023] arXiv paper has been released.

[02/2023] The code has been released.

Introduction

Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. For example, a robot needs to understand new instructions, and an opinion monitoring system should analyze emerging topics every day. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs --- the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in deep class-incremental learning and summarize these methods from three aspects, i.e., data-centric, model-centric, and algorithm-centric. We also provide a rigorous and unified evaluation of 16 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures.

Requirements

Environment

Here are the requirements for running the code:

  1. torch 1.81
  2. torchvision 0.6.0
  3. tqdm
  4. numpy
  5. scipy
  6. quadprog
  7. POT

Dataset

We use CIFAR100 and ImageNet100/1000 for our experiments. CIFAR100 will be automatically downloaded when running the code.

Here is the file list of ImageNet100.

Pre-Trained Models

As discussed in the main paper, we aim for a fair comparison among different methods and align the performance at the first stage. Please refer to the instructions to download the pre-trained models in the following section.

Code Structures

  • checkpoints: We supply the same pre-trained checkpoints for most methods for a fair comparison. Please download the checkpoint from Google Drive or Onedrive and put these checkpoints in this folder.
  • convs: The network structures adopted in the implementation.
  • exps: The default config files for compared methods. It should be noted that these config files will be overwritten by the parameters (e.g., function setup_parser in main.py) passed via the command line.
  • scripts: The scripts for running the code in our evaluations.
  • models: The implementation of different CIL methods.
  • utils: Useful functions for dataloader and incremental actions.

Supported Methods

  • FineTune: Baseline method which simply updates parameters on new tasks and suffers from catastrophic forgetting.
  • EWC: Overcoming catastrophic forgetting in neural networks. PNAS2017 [paper]
  • LwF: Learning without Forgetting. ECCV2016 [paper]
  • Replay: Baseline method with exemplars.
  • GEM: Gradient Episodic Memory for Continual Learning. NIPS2017 [paper]
  • iCaRL: Incremental Classifier and Representation Learning. CVPR2017 [paper]
  • BiC: Large Scale Incremental Learning. CVPR2019 [paper]
  • WA: Maintaining Discrimination and Fairness in Class Incremental Learning. CVPR2020 [paper]
  • PODNet: PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning. ECCV2020 [paper]
  • DER: DER: Dynamically Expandable Representation for Class Incremental Learning. CVPR2021 [paper]
  • RMM: RMM: Reinforced Memory Management for Class-Incremental Learning. NeurIPS2021 [paper]
  • Coil: Co-Transport for Class-Incremental Learning. ACM MM2021 [paper]
  • FOSTER: Feature Boosting and Compression for Class-incremental Learning. ECCV 2022 [paper]
  • MEMO: A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning. ICLR 2023 [paper]

Towards a Fair Comparison of Class-Incremental Learning

In the main paper, we conduct three types of empirical evaluations to find out the characteristics of different methods. They are listed as:

  • Benchmark comparison: compares the performance of different methods with the same number of exemplars, e.g., 2000 for CIFAR100.
  • Memory-aligned comparison: compares the performance of different methods with the same memory budget to DER. For those methods that consume less budget than DER, we align the cost by saving extra exemplars. For example, a ResNet32 model costs 463,504 parameters (float), while a CIFAR image requires 3 × 32 × 32 integer numbers (int). Hence, the budget for saving a backbone is equal to saving 463,504 floats ×4 bytes/float ÷(3 × 32 × 32) bytes/image ≈ 603 exemplars for CIFAR. Specifically, the memory cost of different methods in traditional benchmark protocol is shown in the left figure, while memory-aligned comparison advocates comparison in the right figure.
  • Memory-agnostic comparison: extends the memory-aligned comparison to the memory-agnostic performance measures, e.g., AUC-A and AUC-L. We set several memory budgets and align the cost of each method to them, drawing the performance-memory curve. The memory-agnostic comparison is not based on any assigned budget, which better measures the extendability of class-incremental learning models.

Method CIFAR100 Base0 Inc10 ImageNet100 Base50 Inc5
AUC-A AUC-L AUC-A AUC-L
GEM 4.31 1.70 - -
Replay 10.49 8.02 553.6 470.1
iCaRL 10.81 8.64 607.1 527.5
PODNet 9.42 6.80 701.8 624.9
Coil 10.60 7.82 601.9 486.5
WA 10.80 8.92 666.0 581.7
BiC 10.73 8.30 592.7 474.2
FOSTER 11.12 9.03 638.7 566.3
DER 10.74 8.95 699.0 639.1
MEMO 10.85 9.03 713.0 654.6

Visualizations

We provide the visualizations of the confusion matrix and weight norm of classifiers in class-incremental learning. These visualizations are drawn with the logs after running the code.

Running scripts

There are three types of experiments in our survey, i.e., benchmark, memory-aligned (fair), and memory-agnostic (auc) in the scripts folder). We give all the scripts for running the experiments in this paper. For example, if you are interested in the benchmark comparison, please run the following command:

bash ./scripts/benchmark/cifar_b0_5_finetune.sh

Similarly, you can run the other scripts in the same way.

Calculating the number of exemplars

It must be noted that the memory-aligned and memory-agnostic comparison protocol requires calculating the number of exemplars for each method. Please refer to compute_exemplar.py for more details. The following is an example of calculating the number of exemplars for the fair and auc protocol.

Fair

bash run-fair.sh

AUC

python compute_exemplar.py -p auc

Acknowledgment

This repo is modified from PyCIL.

Correspondence

This repo is developed and maintained by Qi-Wei Wang and Zhi-Hong Qi. If you have any questions, please feel free to contact us by opening new issues or email:

visitors