Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

Wenxuan Ma, Shuang Li, Jinming Zhang, Chi Harold Liu, Jingxuan Kang, Yulin Wang, and Gao Huang

Official implementation of our ICCV 2023 paper (BorLan).

Paradigm Introduction

BorLan is a simple data-efficient learning paradigm that includes three parts:

Obtain text embedding of task concepts via pre-trained language model (PLM). (This part can be conducted before the visual training once and for all for a given dataset.)
Main task loss (i.e., CrossEntropy)
Distribution alignment loss that leverages text embedding space to promote data-efficient visual training.

Training

Step 1: Obtain text embedding of concepts via PLM.

Run the following command to obtain text embeddings.

You need to modify the following things in the code:

classnames: List
save_name: str

# Bert-Large
python text_features/text_embedding.py

# GPT-2
python text_features/text_embedding_gpt.py

# CLIP ViT-Large
python text_features/text_embedding_clip.py

Step 2: Linguistic knowledge guided vision model training.

Run the following command for Semi-Supervised Learning tasks:

sh run.sh

Acknowledgement

This repository borrows codes from the following repos. Many thanks to the authors for their great work.

Self-Tuning: https://github.com/thuml/Self-Tuning

CoOp: https://github.com/KaiyangZhou/CoOp

Citation

If you find this project useful, please consider citing:

@inproceedings{ma2023borrowing,
  title={Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm},
  author={Ma, Wenxuan and Li, Shuang and Zhang, Jinming and Liu, Chi Harold and Kang, Jingxuan and Wang, Yulin and Huang, Gao},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  year={2023}
}

Contact

If you have any questions about our code, feel free to contact us or describe your problem in Issues.

Email address: wenxuanma@bit.edu.cn.

↥

BIT-DA/BorLan