/SPPA

Pytorch implementation of: "Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment", ECCV22

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

SPPA

This is the reference code of work: "Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment" published at ECCV22.


framework

requirements

The experiments is performed using the following libraries

  • Python (3.9)
  • Pytorch (1.9.1)
  • torchvision (0.10.1)
  • tensorboard (2.5.0)
  • numpy (1.21)

Datasets

In this work, we use VOC2012 and ADE20K. They can be easily downloaded from their official website

Perform Training

The entrance is main.py. Simply do python main.py [options] to perform training. The key paramters are listed as follows:

  • data_root: path to dataset folder
  • task: select the tasks defined in datasets/tasks.py
  • batch_size: batch size per GPU = batch_size/num_GPU
  • epochs: 30 for VOC and 60 for ADE20K
  • lr: the list of learning rate for each step defined in task. For example, use [7e-3, 7e-4] for 15-5 VOC.
  • logging_path: path to log all training information

All supported CLI parameters can be found in utils/argparser.py. Most parameters can use their default value and left untouched but feel free to adjust them if needed.

If you want to do testing, specify the following parameters:

  • ckpt: path for model checkpoint
  • ckpt_model_only: load model parameters only
  • test_only: perform testing

Key components

The definition and training code of the projector module lie in modules/projector.py and modules/transnet.py. The projector is optimized using SGD witm momentum. It is trained with a batch size of 32 for 1.5K iters on VOC and 3K iters on ADE. the learning rate is 1e-1 for the first 75% iters and 1e-2 for the rest.

All the losses we proposed in this work lie in utils/loss.py with detailed Docstrings. They can be easily integrated with little modification in other code base or tasks.

Hyper-parameters

The hyper-parameters of our method can be set as follows. Note that the best parameters may vary across datasets and setups.

  • L_ali: alpha can be between 10 to 100, we use 30.
  • L_str: beta can be between 1 to 100, we use 10. nu * beta can be between 1e-2 to 1e-1, we use 1e-1.
  • L_cont: gamma can be between 1e-3 to 1e-1, we use 1e-2. usually mu = 1 is good.
  • pseudo label: T_c is selected to keep 80% percent of the raw pseudo labels