This is official implementation for "Towards Balanced Active Learning for Multimodal Classification".
- src
- config (configuration for each experiment)
- dataset (datasets)
- exp (pytorch lightning module)
- model (MM-models)
- run (pytorch lightning trainer)
- strategy (active learning sampling strategies)
- utils
- bmmal
- random
- bald
- entropy
- coreset
- kmeans
- badge
- deepfool
- gcn
conda create -n mmal -m python=3.9
conda activate mmal
# we only test with pytorch version 1.13.1 and cuda 11.6
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
# install the rest dependencies using pip
pip install -r requirements.txt
- UPMC_Food101
- Download dataset following this paper Recipe recognition with large multimodal food dataset
- Process food101 using preprocess.py to get test.json and train.json files
- The final dataset file structure will be like:
├── UPMC_Food101
├── train.json
├── test.json
├── images
├── train
├── label_name
├── label_name_id.jpg
├── ...
├── test
├── label_name
├── label_name_id.jpg
├── ...
├── texts_txt
├── label_name
├── label_name_id.txt
├── ...
- KineticsSound
- Download dataset following Kinetics Datasets Downloader
- Run kinetics_convert_avi.py to convert mp4 files into avi files.
- Run kinetics_arrange_by_class.py to organize the files.
- Run extract_wav_and_frames.py to extract wav files and 10 frame images as jpg.
- The final dataset file structure will be like:
├── kinetics_sound
├── my_train.txt
├── my_test.txt
├── train
├── video
├── label_name
├── vid_start_end
├── frame_0.jpg
├── frame_1.jpg
├── ...
├── frame_9.jpg
├── audio
├── label_name
├── vid_start_end.wav
├── ...
├── test
├── ...
- VGGSound
- Download dataset following VGGSound
- Run vggsound_convert_avi.py to convert mp4 files into avi files.
- Run extract_wav_and_frames.py to extract wav files and 10 frame images as jpg.
- The final dataset file structure will be like:
├── vggsound
├── vggsound.csv
├── video
├── train
├── label_name
├── vid_start_end.avi
├── test
├── ...
├── frames
├── train
├── label_name
├── vid_start_end
├── frame_0.jpg
├── frame_1.jpg
├── ...
├── frame_9.jpg
├── test
├── ...
├── audio
├── train
├── label_name
├── vid_start_end.wav
├── test
├── ...
cd mmal
export PYTHONPATH=$PWD
python run/runner.py -s {strategy} --seed {random_seed} -c {config_file} -d {cuda_device_index} -r {al_iteration}
Currently, we only support to run experiments on Single GPU card.
To make sure each strategy begins with the same initialization, we highly recommend to start with the copy of the model trained with random sampling.
For example, if you want to examine performance of bmmal and badge:
# run the first iteration of active learning using random sampling
python run/runnner.py -s random --seed 1000 -c config/food101.yml -d 0 -r 0
# keep a copy of first iteration
cp -r logs/food101/food101-random-1000/version_0 logs/food101/food101-random-initialized-1000/version_0
cp -r logs/food101/food101-random-1000/task_model.ckpt logs/food101/food101-random-initialized-1000/task_model.ckpt
# get a copy and renamed it as bmmal
cp -r logs/food101/food101-random-initialized-1000/version_0 logs/food101/food101-bmmal-1000/version_0
cp -r logs/food101/food101-random-initialized-1000/task_model.ckpt logs/food101/food101-bmmal-1000/task_model.ckpt
# start bmmal sampling and training for second iteration
python run/runnner.py -s bmmal --seed 1000 -c config/food101.yml -d 0 -r 1
# get a copy and renamed it as badge
cp -r logs/food101/food101-random-initialized-1000/version_0 logs/food101/food101-badge-1000/version_0
cp -r logs/food101/food101-random-initialized-1000/task_model.ckpt logs/food101/food101-badge-1000/task_model.ckpt
# start badge sampling and training for second iteration
python run/runnner.py -s badge --seed 1000 -c config/food101.yml -d 0 -r 1
By doing so, we can fairly compare the performance among different strategies with the same initialized iteration zero.
If we run active learning loop for 5 rounds, we will see version_0 to version_4 storing logging files for each round.
- logs
- {logger_save_dir}
- {dataset_name}-{strategy}-{random_seed}
- version_{al_iteration}
- metrics.csv
- al_metrics.csv # it stores metrics values in csv format
- tf
- tf.events
- version_{al_iteration}
- {dataset_name}-{strategy}-{random_seed}
- {logger_save_dir}
tensorboard --logdir logs/{logger_save_dir}
This repo is under CC BY 4.0 License. See LICENSE for details.
If you find this code helpful, please cite our paper:
@article{shen2023towards,
title={Towards Balanced Active Learning for Multimodal Classification},
author={Shen, Meng and Huang, Yizheng and Yin, Jianxiong and Zou, Heqing and Rajan, Deepu and See, Simon},
journal={arXiv preprint arXiv:2306.08306},
year={2023}
}