/EMP-Net

[ECCV 2024] The Official implementation for 'Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning'.

Primary LanguagePython

EMP-Net

The Official implementation for Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning (ECCV 2024).

Prerequisite

Requirements:

  • Python>=3.6
  • torch>=1.5
  • torchvision (version corresponding with torch)
  • simplejson==3.11.1
  • decord>=0.6.0
  • pyyaml
  • einops
  • oss2
  • psutil
  • tqdm
  • pandas

optional requirements

  • fvcore (for flops calculation)

You can create environments simply with the following command:

conda env create -f environment.yaml

Data

First, you need to download the datasets from their original source and put them into data:

Then, prepare data according to the splits we provide.

Preprocessing Something-Something-V2 dataset

Following the setup of pytorch-video-understanding codebase, the video decoder decord has difficulty in decoding the original webm files. So we provide a script for preprocessing the .webm files in the original something-something-v2 dataset to .mp4 files. To do this, simply run:

python datasets/utils/preprocess_ssv2_annos.py --anno --anno_path path_to_your_annotation
python datasets/utils/preprocess_ssv2_annos.py --data --data_path path_to_your_ssv2_videos --data_out_path path_to_put_output_videos

Remember to make sure the annotation files are organized as follows:

-- path_to_your_annotation
    -- something-something-v2-train.json
    -- something-something-v2-validation.json
    -- something-something-v2-labels.json

Running

The entry file for all the runs are runs.

Before running, some settings need to be done.

For an example run of 1 shot on SSV2-Small, all configurations can be found from configs, you can modify any configurations as needed.

Then the codebase can be run by:

python runs/run.py --cfg ./configs/projects/EMP_Net/ssv2_small/EMP_Net_SSv2_Small_1shot_v1.yaml

Citation

If you find this model useful for your research, please use the following BibTeX entry.

@inproceedings{wuefficient,
  author={Wu, Cong and Wu, Xiao-Jun and Li, Linze and Xu, Tianyang and Feng, Zhenhua and Kittler, Josef},
  title={Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning},
  booktitle={European Conference on Computer Vision},
  year={2024},
  organization={Springer}
}

Acknowledgement

Thanks for the framework provided by CLIP-FSAR, which is source code of the published work CLIP-guided Prototype Modulating for Few-shot Action Recognition in IJCV 2023.

Contact

For any further questions, feel free to contact: congwu@jiangnan.edu.cn.