/MQNet

Primary LanguagePythonApache License 2.0Apache-2.0

Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning (NeurIPS 2022, PDF)

by Dongmin Park1, Yooju Shin1, Jihwan Bang2,3, Youngjun Lee, Hwanjun Song2, Jae-Gil Lee1

1 KAIST, 2 NAVER AI Lab, 3 NAVER CLOVA

  • Oct 19, 2022: Our work is publicly available at ArXiv.
  • Dec 28, 2022: Our work is published in NeurIPS 2022.

How to run

MQ-Net

  • CIFAR10
python3 main_split.py --epochs 200 --epochs-csi 1000 --epochs-mqnet 100 --datset 'CIFAR10' --n-class 10 --n-query 500 \
                      --method 'MQNet' --mqnet-mode 'LL' --ssl-save True --ood-rate 0.6
  • CIFAR100
python3 main_split.py --epochs 200 --epochs-csi 1000 --epochs-mqnet 100 --datset 'CIFAR100' --n-class 100 --n-query 500 \
                      --method 'MQNet' --mqnet-mode 'LL' --ssl-save True --ood-rate 0.6
  • ImageNet50
python3 main_split.py --epochs 200 --epochs-csi 1000 --epochs-mqnet 100 --datset 'ImageNet50' --n-class 50 --n-query 1000 \
                      --method 'MQNet' --mqnet-mode 'LL' --ssl-save True --ood-rate 0.6
  • For ease of expedition, we provide CSI pre-trained models for split-experiment below
Noise Ratio Architecture CIFAR10 CIFAR100
60% ResNet18 weights weights

Other Baselines: Uncertainty(CONF), CoreSet, LL, BADGE, CCAL, SIMILAR

  • CIFAR10, CIFAR100, ImageNet50
python3 main_split.py --epochs 200 --datset $dataset --n-query $num_query --method $al_algorithm --ood-rate $ood_rate

Requirements

torch: +1.3.0
torchvision: 1.7.0
torchlars: 0.1.2
prefetch_generator: 1.0.1
submodlib: 1.1.5
diffdist: 0.1
scikit-learn: 0.24.2
scipy: 1.5.4

MQ-Net Overview

Importance of Handling OOD examples in Active Learning

  • OOD examples are uncertain in prediction & diverse in representation space
  • They are Likely to be queried by standard AL algorithms, e.g., uncertainty- and diversity-based
  • Since OOD examples are useless for target task, it wastes the labeling cost and significantly degrades AL performance

  • Above figures are AL performance on CIFAR10 mixed with SVHN at the open-set noise ratio of 50% (1:1 mixing)
  • With a such high noise ratio, uncertainty- and diversity-based algorithms queried many OOD examples and thus become even worse than random selection

Purity-Informativeness Dilemma in Open-set Active Learning

  • Recently, two open-set AL algorithms, SIMILAR and CCAL, have been proposed and tried to increase in-distribution purity of query-set
  • However, "Should we focus on the purity throughout the entire AL period?" remains a question

  • Increasing purity ↔ Losing Informativeness --> Trade-off!
  • Which is more helpful? Fewer but more informative examples vs More but less informative examples
  • Optimal trade-off may change according to the AL rounds & noise ratios!

MQ-Net

  • Goal: To keep finding the best balance between purity and informativeness
  • How: Learn a meta query score function

Result

Citation

@article{park2022meta,
  title={Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning},
  author={Park, Dongmin and Shin, Yooju and Bang, Jihwan and Lee, Youngjun and Song, Hwanjun and Lee, Jae-Gil},
  journal={NeurIPS 2022},
  year={2022}
}

References

  • Coreset [code] : Active Learning for Convolutional Neural Networks: A Core-Set Approach, Sener et al. 2018 ICLR
  • LL [code] : Learning Loss for Active Learning, Yoo et al. 2019 CVPR
  • BADGE [code] : Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds, Jordan et al. 2020 ICLR
  • CCAL [code] : Contrastive Coding for Active Learning under Class Distribution Mismatch, Du et al. 2021 ICCV
  • SIMILAR [code] : SIMILAR: Submodular Information Measures based Active Learning in Realistic Scenarios, Kothawade et al. 2021 NeurIPS