/orca

The source code of the open world semi-supervised paper, with detailed explanations of each line of code added

Primary LanguagePython

Open-World Semi-Supervised Learning

Kaidi Cao*, Maria Brbić*, Jure Leskovec

Project website


This repository contains the PyTorch reference source code of the ORCA algorithm, and its code is explained in detail. ORCA is a pipeline that recognizes previously seen classes and discovers novel, never-before-seen classes at the same time.. For more details please check our paper Open-World Semi-Supervised Learning (ICLR '22).

Dependencies

The code is built with following libraries:

Usage

Get Started

We use SimCLR for pretraining. The weights used in our paper can be downloaded in this link.

  • To train on CIFAR-100, run
python orca_cifar.py --dataset cifar100 --labeled-num 50 --labeled-ratio 0.5
  • To train on ImageNet-100, first use gen_imagenet_list.py to generate corresponding splitting lists, then run
python orca_imagenet.py --labeled-num 50 --labeled-ratio 0.5

When downloading the dataset fails, try to manually download the dataset and add it to ./dataset

Citing

If you find our code useful, please consider citing:

@inproceedings{
    cao2022openworld,
    title={Open-World Semi-Supervised Learning},
    author={Kaidi Cao and Maria Brbic and Jure Leskovec},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=O-r8LOR-CCA}
}