/Side4Video

Primary LanguagePythonMIT LicenseMIT

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

arXiv

PWC PWC PWC PWC PWC PWC

This repository is the official implementation of Side4Video, which significantly reduces the training memory cost for action recognition and text-video retrieval tasks.

image

πŸ“° News

πŸ—ΊοΈ Overview

image

πŸš€ Training and Testing

For training and testing our model, please refer to the Recognition and Retrieval folders.

πŸ“Š Results

image
Our best model can achieve an accuracy of 67.3% & 74.6 on Something-Something V1 & V2, 88.6% on Kinetics-400 and a Recall@1 of 52.3% on MSR-VTT, 56.1% on MSVD, 68.8% on VATEX.

πŸ–‡οΈ Citation

If you find this repository is useful, please star🌟 this repo and citeπŸ–‡οΈ our paper.

@article{yao2023side4video,
  title={Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning},
  author={Yao, Huanjin and Wu, Wenhao and Li, Zhiheng},
  journal={arXiv preprint arXiv:2311.15769},
  year={2023}
}

πŸ‘ Acknowledgment

Our implementation is mainly based on the following codebases. We are sincerely grateful for their work.

  • Text4Vis: Revisiting Classifier: Transferring Vision-Language Models for Video Recognition.
  • CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.

πŸ“§ Contact

If you have any questions about this repository, please file an issue or contact Huanjin Yao Gmail Badge or Wenhao Wu Gmail Badge.