This repo (built upon the amazing codebase of mammoth) contains the code for our NeurIPS 2023 paper:
A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm
Haizhou Shi, Hao Wang
Thirty-seventh Conference on Neural Information Processing Systems, 2023
[Paper] [OpenReview] [Slides] [Talk (Youtube)] [Talk (Bilibili)]
- How does UDIL unify existing methods?
- How does UDIL lead to a tighter bound?
- Installing the required packages
- Code for running UDIL
- Quantitative Results
- Related Work
- References
Long story short, in the paper, we start by re-iterating the learning objective of domain-incremental learning (which is also true for other types of continual learning). Then we propose to combine three ways of upper bounding the past-domain error (ERM, intra-domain bound, and cross-domain bound, see Chapter 3 in the paper) and assign adaptive coefficients to each of the upper bound training terms.
Here is the main theorem of our paper, which not only leads to the unification of the current domain-incremental learning methods, but allows for the possibility of minimizing a tighter bound in the next chapter.
The first main argument of our work is that, by fixating the value of the coefficients
A natural question following the unification is: can we do better than using a single set of fixed coefficients to train a domain-incremental learning model? The answer is a firmly YES. And what we do in this work is to parameterize the coefficients, and try to optimize a tighter bound by adjusting them during model training. We know you are in a hurry, so here we will give an extremely brief review of what we do to form the final training objective.
As you can see, there are in total four kinds of differentiable loss terms in our proposed algorithm:- 🔵 Cross-Entropy Classification Loss: it corresponds to the simple ERM terms on the current data and the memory.
- 🟢 Cross-Entropy Distillation Loss: it corresponds to the distillation loss terms between the current model
$h$ and the history model$H_{t-1}$ , computed on the current data and the memory. - 🔴 Adversarial Feature Alignment Loss: it corresponds to the divergence terms between the current data distribution and the past data distribution. If you are interested in how minimizing this term on the feature space can improve the performance in general, please refer to the amazing work "A theory of learning from different domains".
- ⚪ Adaptive Coefficient Optimization: it corresponds to estimating the error (classification accuracy) of each term, and adaptively minimizing the coefficient set
$\Omega={\alpha_i, \beta_i, \gamma_i}$ .
conda create -n udil python=3.9
conda activate udil
conda install pytorch==1.12 torchvision cudatoolkit=11.3 -c pytorch
conda install wandb ipdb -c conda-forge
Before you run the code, there are a couple of settings you might want to modify:
wandb_entity
: atutils/args.py
line 70, change to your own wandb account;data_path
andbase_path
: atutils/conf
line 13-23, change to whatever path you want to store your data and local training logs.
We have provided the command to run UDIL in the /scripts
folders, for different datasets.
Once you are done with setting up everything, a quick example of running UDIL on Permutated-MNIST
is shown as follows:
chmod +x scripts/*.sh
scripts/pmnist.sh
This script will start a UDIL training process and log everything on your wandb repository.
If you are in a hurry, and want to just take a quick review on the training process and final results of UDIL on three different realistic datasets (Permutated-MNIST, Rotated-MNIST, and Seq-CORe50), you can check out the following public UDIL wandb project, where we viusalized everything you might care for you!
Here we provide some quantitative results of UDIL.
Here we provide some qualitative results of UDIL, which come from the public UDIL wandb project, and we only show the results on Rotated-MNIST data.
Accuracy Matrix after 20-Domain Training
Below are the visualization of embedding distributions of different classes & domains, where:
- Left: colors represent different true classes;
- Middle: colors represent different predicted classes by the model;
- Right: colors represent different domains.
Embedding Space Visualization after 1-Domain Training
Embedding Space Visualization after 20-Domain Training
[1] Domain-Indexing Variational Bayes: Interpretable Domain Index for Domain Adaptation
Zihao Xu*, Guang-Yuan Hao*, Hao He, Hao Wang
Eleventh International Conference on Learning Representations, 2023
[Paper] [OpenReview] [PPT] [Talk (Youtube)] [Talk (Bilibili)]
[2] Graph-Relational Domain Adaptation
Zihao Xu, Hao He, Guang-He Lee, Yuyang Wang, Hao Wang
Tenth International Conference on Learning Representations (ICLR), 2022
[Paper] [Code] [Talk] [Slides]
[3] Continuously Indexed Domain Adaptation
Hao Wang*, Hao He*, Dina Katabi
Thirty-Seventh International Conference on Machine Learning (ICML), 2020
[Paper] [Code] [Talk] [Blog] [Slides] [Website]
A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm
@inproceedings{UDIL,
title={A Unified Approach to Domain Incremental Learning with Memory: Theory and Algorithm},
author={Shi, Haizhou and Wang, Hao},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}