EMP-Net

The Official implementation for Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning (ECCV 2024).

Prerequisite
Data
Running
Citation
Acknowledgement
Contact

Prerequisite

Requirements:

Python>=3.6
torch>=1.5
torchvision (version corresponding with torch)
simplejson==3.11.1
decord>=0.6.0
pyyaml
einops
oss2
psutil
tqdm
pandas

optional requirements

fvcore (for flops calculation)

You can create environments simply with the following command:

conda env create -f environment.yaml

Data

First, you need to download the datasets from their original source and put them into data:

Then, prepare data according to the splits we provide.

Preprocessing Something-Something-V2 dataset

Following the setup of pytorch-video-understanding codebase, the video decoder decord has difficulty in decoding the original webm files. So we provide a script for preprocessing the .webm files in the original something-something-v2 dataset to .mp4 files. To do this, simply run:

python datasets/utils/preprocess_ssv2_annos.py --anno --anno_path path_to_your_annotation
python datasets/utils/preprocess_ssv2_annos.py --data --data_path path_to_your_ssv2_videos --data_out_path path_to_put_output_videos

Remember to make sure the annotation files are organized as follows:

-- path_to_your_annotation
    -- something-something-v2-train.json
    -- something-something-v2-validation.json
    -- something-something-v2-labels.json

Running

The entry file for all the runs are runs.

Before running, some settings need to be done.

For an example run of 1 shot on SSV2-Small, all configurations can be found from configs, you can modify any configurations as needed.

Then the codebase can be run by:

python runs/run.py --cfg ./configs/projects/EMP_Net/ssv2_small/EMP_Net_SSv2_Small_1shot_v1.yaml

Citation

If you find this model useful for your research, please use the following BibTeX entry.

@inproceedings{wuefficient,
  author={Wu, Cong and Wu, Xiao-Jun and Li, Linze and Xu, Tianyang and Feng, Zhenhua and Kittler, Josef},
  title={Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning},
  booktitle={European Conference on Computer Vision},
  year={2024},
  organization={Springer}
}

Acknowledgement

Thanks for the framework provided by CLIP-FSAR, which is source code of the published work CLIP-guided Prototype Modulating for Few-shot Action Recognition in IJCV 2023.

Contact

For any further questions, feel free to contact: congwu@jiangnan.edu.cn.

cong-wu/EMP-Net