/ZSCAR_C2C

[ECCV 2024 oral] -C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Primary LanguagePython

[ECCV2024 oral] C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition


C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiaojun Wu†, Muhammad Awais, Sara Atito, Josef Kittler
ECCV, 2024


Seen: Open a door

Seen: Close a book

Unseen: Close a door
Zero-Shot Compositional Action Recognition (ZS-CAR)

🛠️ Prepare Something-composition (Sth-com)

Some samples in Something-composition

  1. Download Something-Something V2 (Sth-v2). Our proposed Something-composition (Sth-com) is based on Sth-V2. We refer to the official website to download the videos to the path video_path.
  2. Extract frames. To accelerate the dataloader when training, we extract the frames for each video and save them in the frame_path. The command is:
    python tools/extract_frames.py --video_root video_path --frame_root frame_path
    
  3. Download Dataset annotations. We provide our Sth-com annotation files in the data_split dir. The format is like:
      [
          {
          "id": "54463", # means the sample name
          "action": "opening a book", # means composition
          "verb": "Opening [something]", # means the verb component
          "object": "book" # means the object component
          },
          {
            ...
          },
          {
            ...
          },
      ]
    
    Please kindly download these files to annotation_path.
  4. Finally, the dataset is built successfully. The structure looks like:
    • annotation_path
      • data_split
        • generalized
          • train_pairs.json
          • val_pairs.json
          • test_pairs.json
    • frame_path
      • 0
        • 000001.jpg
        • 000002.jpg
        • ......
      • 1
        • 000001.jpg
        • 000002.jpg
        • ......
      • ......

🚀 Train and test

🔔 Now take the dir codes as the project root.

Before running

  1. Prepare the word embedding models. We recommend following Compcos to download the word embedding models.

  2. You should modify the paths :

    (For example, running C2C_vanilla, TSM-18 as the backbone.)

    1. dataset_path in ./config/c2c_vanilla_tsm.yml
    2. save_path in ./config/c2c_vanilla_tsm.yml
    3. The code line: t=fasttext.load_model('YOUR_PATH/cc.en.300.bin') in models/vm_models/word_embedding.py

Train

  1. Train a model with the command:
CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python train.py --config config/c2c_vm/c2c_vanilla_tsm.yml

Test

  1. For the test, imagine you have trained your model and set the log dir as YOUR_LOG_PATH.

    Then, you can test it using:

CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python test_for_models.py --logpath YOUR_LOG_PATH

📝 TODO List

  • Add training codes for VM+word embedding paradigm.
  • Add training codes from VLM paradigm.