/iwa

A new method for ensemble learning in unsupervised domain adaptation with asymptotically optimal error rate.

Primary LanguagePythonMIT LicenseMIT

Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation [Paper]

alt text

A PyTorch suite to systematically evaluate different domain adaptation methods.

Requirmenets:

  • Python3
  • Pytorch==1.7
  • Numpy==1.20.1
  • scikit-learn==0.24.1
  • Pandas==1.2.4
  • skorch==0.10.0 (For DEV risk calculations)
  • openpyxl==3.0.7 (for classification reports)
  • Wandb=0.12.7
  • Hydra=1.2.0
  • OmegaConf=2.2.3

Installing

  1. Clone repository
git clone git@github.com:<repo>
cd bpda
  1. Create a python 3 conda environment
conda env create -f environment.yml
  1. Ensure that all required temp directories are available
  • data

Datasets

Available Datasets

We used four public datasets in this study. We also provide the preprocessed versions as follows:

Adding New Dataset

Structure of data

To add new dataset (e.g., NewData), it should be placed in a folder named: NewData in the datasets directory.

Since "NewData" has several domains, each domain should be split into train/test splits with naming style as "train_x.pt" and "test_x.pt".

The structure of data files should in dictionary form as follows: train.pt = {"samples": data, "labels: labels}, and similarly for test.pt.

Configurations

Next, you have to add a class with the name NewData in the configs/data_model_configs.py file. You can find similar classes for existing datasets as guidelines. Also, you have to specify the cross-domain scenarios in self.scenarios variable.

Last, you have to add another class with the name NewData in the configs/hparams.py file to specify the training parameters.

Domain Adaptation Algorithms

Existing Algorithms

Adding New Algorithm

To add a new algorithm, place it in algorithms/algorithms.py file.

Training procedure

To train the models run:

./run.sh

To collect the results run:

./collect_results.sh

Upper and Lower bounds

Main trainer file is trainer.py and includes also source-only results when executed.

Results

  • Each run will have all the cross-domain scenarios results in the format runx_src_to_trg, where x is the run_id.
  • Under each directory, you will find the classification report, a log file, checkpoint, and the different risks scores.
  • By the end of the all the runs, you will find the overall average and std results in the run directory.

References

Citation

@inproceedings{
  IWA23,
  title={Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation},
  author={Dinu, Marius-Constantin and Beck, Maximilian and Nguyen, Duc Hoan and Huber, Andrea and Eghbal-zadeh, Hamid and Moser, Bernhard A. and Pereverzyev, Sergei V. and Hochreiter, Sepp and Zellinger, Werner},
  booktitle={Submitted to The Eleventh International Conference on Learning Representations },
  year={2023},
  url={https://openreview.net/forum?id=M95oDwJXayG},
  note={under review}
}