/EmotionCLIP

[CVPR 2023] Code for "Learning Emotion Representations from Verbal and Nonverbal Communication"

Primary LanguagePythonMIT LicenseMIT

Learning Emotion Representations from Verbal and Nonverbal Communication

This is a Pytorch implementation of the EmotionCLIP paper:

@inproceedings{zhang2023learning,
  title={Learning Emotion Representations from Verbal and Nonverbal Communication},
  author={Zhang, Sitao and Pan, Yimu and Wang, James Z},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18993--19004},
  year={2023}
}

Overview

EmotionCLIP

Requirement

The code is built with following libraries:

  • pytorch
  • scikit-learn
  • einops
  • ftfy
  • regex
  • pandas
  • orjson
  • h5py
  • wandb
  • tqdm
  • rich
  • termcolor

Extra setup is required for data preprocessing. Please refer to preprocessing.

Usage

Testing

The pre-trained EmotionCLIP model can be downloaded here. We follow the linear-probe evaluation protocol employed in CLIP. To test the pre-trained model on a specific dataset, run

python linear_eval.py \
  --dataset <dataset_name> \
  --ckpt-path <path_to_the_pretrained_model>

Training

We use the weight provided by OpenCLIP as the starting point for our training. Please download the weight here and put it under src/pretrained. To do training on the YouTube video dataset with default settings, run

python main.py \
  --video-path <path_to_the_video_frames_folder> \
  --caption-path <path_to_the_video_caption_folder> \
  --sentiment-path <path_to_the_sentiment_logits_file> \
  --index-path <path_to_the_index_file>

Acknowledgments

Our code is based on CLIP and OpenCLIP.