[ECCV2024 oral] C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiaojun Wu†, Muhammad Awais, Sara Atito, Josef Kittler
ECCV, 2024
Seen: Open a door |
Seen: Close a book |
Unseen: Close a door |
Zero-Shot Compositional Action Recognition (ZS-CAR)
Some samples in Something-composition
- Download Something-Something V2 (Sth-v2). Our proposed Something-composition (Sth-com) is based on Sth-V2. We refer to the official website to download the videos to the path video_path.
- Extract frames. To accelerate the dataloader when training, we extract the frames for each video and save them in the frame_path. The command is:
python tools/extract_frames.py --video_root video_path --frame_root frame_path
- Download Dataset annotations. We provide our Sth-com annotation files in the data_split dir. The format is like:
Please kindly download these files to annotation_path.
[ { "id": "54463", # means the sample name "action": "opening a book", # means composition "verb": "Opening [something]", # means the verb component "object": "book" # means the object component }, { ... }, { ... }, ]
- Finally, the dataset is built successfully. The structure looks like:
- annotation_path
- data_split
- generalized
- train_pairs.json
- val_pairs.json
- test_pairs.json
- generalized
- data_split
- frame_path
- 0
- 000001.jpg
- 000002.jpg
- ......
- 1
- 000001.jpg
- 000002.jpg
- ......
- ......
- 0
- annotation_path
🔔 Now take the dir codes as the project root.
-
Prepare the word embedding models. We recommend following Compcos to download the word embedding models.
-
You should modify the paths :
(For example, running C2C_vanilla, TSM-18 as the backbone.)
- dataset_path in ./config/c2c_vanilla_tsm.yml
- save_path in ./config/c2c_vanilla_tsm.yml
- The code line: t=fasttext.load_model('YOUR_PATH/cc.en.300.bin') in models/vm_models/word_embedding.py
- Train a model with the command:
CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python train.py --config config/c2c_vm/c2c_vanilla_tsm.yml
-
For the test, imagine you have trained your model and set the log dir as YOUR_LOG_PATH.
Then, you can test it using:
CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python test_for_models.py --logpath YOUR_LOG_PATH
- Add training codes for VM+word embedding paradigm.
- Add training codes from VLM paradigm.