/CIL_Survey

CIL_Survey

Primary LanguagePython

Deep Class-Incremental Learning: A Survey

The code repository for "Deep Class-Incremental Learning: A Survey" in PyTorch.

Updates

[02/2023] The code has been released.

Introduction

Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. For example, a robot needs to understand new instructions, and an opinion monitoring system should analyze emerging topics every day. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs --- the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in deep class-incremental learning and summarize these methods from three aspects, i.e., data-centric, model-centric, and algorithm-centric. We also provide a rigorous and unified evaluation of 16 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures.

Requirements

Environment

Here are the requirements for running the code:

  1. torch 1.81
  2. torchvision 0.6.0
  3. tqdm
  4. numpy
  5. scipy
  6. quadprog
  7. POT

Dataset

We use CIFAR100 and ImageNet100/1000 for our experiments. CIFAR100 will be automatically downloaded when running the code.

Pre-Trained Models

As discussed in the main paper, we aim for a fair comparison among different methods and align the performance at the first stage. Please refer to the instructions to download the pre-trained models in the following section.

Code Structures

  • checkpoints: We supply the same pre-trained checkpoints for most methods for a fair comparison. Please download the checkpoint from Google Drive or Onedrive and put these checkpoints in this folder.
  • convs: The network structures adpoted in the implementation.
  • exps: The default config files for compared methods. It should be noted that these config files will be overwritten by the parameters (e.g., function setup_parser in main.py) passed via the command line.
  • scripts: The scripts for running the code in our evaluations.

Towards a Fair Comparison of Class-Incremental Learning

In the main paper, we conduct three types of empirical evaluations to find out the characteristics of different methods. They are listed as:

  • Benchmark comparison: compares the performance of different methods with the same number of exemplars, e.g., 2000 for CIFAR100.

  • Memory-aligned comparison: compares the performance of different methods with the same memory budget to DER.

  • Memory-agnostic comparison: extends the memory-aligned comparison to the memory-agnostic performance measures, e.g., AUC-A and AUC-L. We set several memory budgets and align the cost of each method to them, drawing the performance-memory curve.

Method CIFAR100 Base0 Inc10 ImageNet100 Base50 Inc5
AUC-A AUC-L AUC-A AUC-L
GEM 4.31 1.70 - -
Replay 10.49 8.02 553.6 470.1
iCaRL 10.81 8.64 607.1 527.5
PODNet 9.42 6.80 701.8 624.9
Coil 10.60 7.82 601.9 486.5
WA 10.80 8.92 666.0 581.7
BiC 10.73 8.30 592.7 474.2
FOSTER 11.12 9.03 638.7 566.3
DER 10.74 8.95 699.0 639.1
MEMO 10.85 9.03 713.0 654.6

Visualizations

We provide the visualizations of the confusion matrix and weight norm of classifiers in class-incremental learning. These visualizations are drawn with the logs after running the code.

Running scripts

There are three types of experiments in our survey, i.e., benchmark, memory-aligned (fair), and memory-agnostic (auc) in the scripts folder). We give all the scripts for running the experiments in this paper. For example, if you are interested in the benchmark comparison, please run the following command:

bash ./scripts/benchmark/cifar_b0_5_finetune.sh

Similarly, you can run the other scripts in the same way.

Calculating the number of exemplars

It must be noted that the memory-aligned and memory-agnostic comparison protocol requires calculating the number of exemplars for each method. Please refer to compute_exemplar.py for more details. The following is an example of calculating the number of exemplars for the fair and auc protocol.

Fair

bash run-fair.sh

AUC

python compute_exemplar.py -p auc

Acknowledgment

This repo is modified from PyCIL.

Correspondence

This repo is developed and maintained by Qi-Wei Wang and Zhi-Hong Qi. If you have any questions, please feel free to contact us by opening new issues or email: