Source Label Adaptation

The official Pytorch implementation of "Semi-Supervised Domain Adaptation with Source Label Adaptation" accepted to CVPR 2023.

Introduction
Demo
Setting up Python Environment
Data Preparation
Setting up Wandb
Running the model
Citation
Acknowledgement

Introduction

In this work, we present a general framework, Source Label Adaptation (SLA) for Semi-Supervised Domain Adaptation. We introduce a novel source-adaptive paradigm that adapts the source data to match the target data. Our key idea is to view the source data as a noisily-labeled version of the ideal target data. Since the paradigm is very different from the core ideas behind existing SSDA approaches, our proposed model can be easily coupled with the current state-of-the-art SSDA methods and further improve their performance. The illustration of the framework is shown below.

Click on the image below to watch the video explaining our work.

Demo

The demo below shows results of 6 different methods implemented in the code on 3-shot Office-Home A -> C case with the seed 19980802.

After selecting the test accuracy achieved at the iteration when the best evaluation accuracy was obtained, we observed improvements of +3.214%, +1.007%, +2.183% for the base, mme, cdac methods, respectively, after applying our SLA method.

Check more demos in our demo page.

More experimental results can be found in our main paper.

Setting up Python Environment

Use conda to create a new environment by running the following command:

conda env create --name <env_name> --file environment.yaml

Replace <env_name> with the desired name of your new environment. This command will create a new environment with Python version 3.10.10 and install all the required packages specified in the environment.yaml file.

Compatible PyTorch version

The environment file specifies PyTorch version 2.0. Emprically it has shown to speed up the training progress.

However, the code does not use any PyTorch 2.0 features and should be compatible with older versions of PyTorch, such as version 1.12.0.

Data Preparation

Supported Datasets

Currently, we support the following datasets:

Dataset Architecture

The dataset is organized into directories, as shown below:

- dataset_dir
    - dataset_name
        - domain 1
        - ...
        - domain N
        - text
            - domain 1
                - all.txt
                - train_1.txt
                - train_3.txt
                - test_1.txt
                - test_3.txt
                - val.txt
            - ...
            - domain N
    - ...

Download and Preparation

Before running the data preparation script, make sure to update the configuration file in data_preparation/dataset.yaml with the correct settings for your dataset. In particular, you will need to update the dataset_dir variable to point to the directory where your dataset is stored.

dataset_dir: /path/to/dataset

To download and prepare one of these datasets, run the following commands:

cd data_preparation
python data_preparation.py --dataset <DATASET>

Replace with the name of the dataset you want to prepare (e.g. DomainNet, OfficeHome). This script will download the dataset (if necessary) and extract the text data which specify the way to split training, validation, and test sets. The resulting data will be saved in the format described above.

After running the data preparation script, you should be able to use the resulting data files in this repository.

Setting up Wandb

We use Wandb to record our experimental results. Check here for more details. The code will prompt you to login to your Wandb account.

Running the model

Baseline methods

To run the main Python file, use the following command:

python main.py --method mme --dataset OfficeHome --source 0 --target 1 --seed 1102 --num_iters 10000 --shot 3shot

This command runs the MME model on the 3-shot A -> C Office-Home dataset, with the specified hyperparameters. You can modify the command to run different experiments with different hyperparameters or on different datasets.

The following methods are currently supported:

base: Uses S+T as described in our main paper.
mme: Uses mme as described in this paper.
cdac: Uses cdac as described in this paper.

Applying SLA to baseline methods

To apply our proposed SLA method, append the suffix "_SLA" to the selected method. For example:

python main.py --method mme_SLA --dataset OfficeHome --source 0 --target 1 --seed 1102 --num_iters 10000 --shot 3shot --alpha 0.3 --update_interval 500 --warmup 500 --T 0.6

This command runs the MME + SLA model on the 3-shot A -> C Office-Home dataset, with the specified hyperparameters. Check our main paper to find the recommended hyperparameters for each method on each dataset.

Citation

If you find our work useful, please cite it using the following BibTeX entry:

@InProceedings{Yu_2023_CVPR,
    author    = {Yu, Yu-Chu and Lin, Hsuan-Tien},
    title     = {Semi-Supervised Domain Adaptation With Source Label Adaptation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {24100-24109}
}

Acknowledgement

This code is partially based on MME, CDAC and DeepDA.

The backup urls for OfficeHome, Office31 are provided here.

LeoLee0097/SLA