[CVPR 2024] Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

¹Chi-Hsi Kung, ^1,Shu-Wei Lu, ²Yi-Hsuan Tsai, ¹Yi-Ting Chen

¹National Yang Ming Chiao Tung University, ²Google

[CVPR] [arxiv] [Project Page] [Challenge ECCV'24]

This repository contains the official code for training and evaluating baselines presented in the paper.

🔥 News

2024/06/03: Challenge for the Atomic Activity Recognition Challenge is launched, come win the prize!
2024/05/26: Code for the Atomic Activity Recognition Challenge is released!
2024/04/12: Challenge for Atomic Activity Recognition got accepted by ECCV 2024! Check more details here.
2024/02/27: Action-slot got accepted by CVPR 2024!

🚀 Installation

Create and activate the conda environment:

conda create --name action_slot python=3.7
pip install -r requirements.txt

📦 Datasets Download

TACO: [One-drive]

The TACO dataset consists of 13 folders of videos (scenarios), which are separated based on different maps (e.g., Town01, Town02...) in the CARLA simulator and different collecting methods (i.e., autopilot (AP), scenario runner (runner), and manual collecting [1] (i.e., interactive & non-interactive)). We use data collected in Town03 as val set and Town10HD as the test set. Please refer to the supplementary material for more dataset details. Note that we use both train and val splits for training in our benchmark. Please also note that there's an updated number of videos in each split, train: 2753, val: 977, test: 1446.

OATS [2] [Website]

[1] Kung et al. "RiskBench: A Scenario-based Benchmark for Risk Identification". ICRA 2024.

[2] Agarwal and Chen "Ordered Atomic Activity for Fine-grained Interactive Traffic Scenario Understanding". ICCV 2023

🔥🔥🔥 3rd ROAD Challenge for Atomic Activity Recognition

We provide the script to generate .pkl prediction file on the test set for the challenge on TACO. Please note that

we use both train_split and val_split for training, and report the baseline performance on the val and test set.
The dataset is sampled for 16 frames as input.

The script takes Action-slot as an example:

cd scripts/
python generate_test_results.py --split [val/test] --cp path_checkpoint --model_name action_slot --backbone x3d --bg_slot --bg_mask --allocated_slot  --root path_taco

To participate in the challenge, you only need to compress the generated .pkl file with .zip format and upload it to eval.ai platform.

🌐 Train & Evaluation on TACO

Training

# Action-slot
python train_taco.py --dataset taco --root [path_to_TACO] --model_name action_slot --num_slots 64 --bg_slot --bg_mask --action_attn_weight 1 --allocated_slot --bg_attn_weight 0.5

# X3D
python train_taco.py --dataset taco --root [path_to_TACO] --model_name x3d

Evaluation

# Action-slot
python eval_taco.py --cp [path_to_checkpoint] --root [path_to_TACO] --dataset taco --model_name action_slot --num_slots 64 --bg_slot --allocated_slot

# X3D
python eval_taco.py --root [path_to_TACO] --cp [path_to_checkpoint] --dataset taco --model_name x3d

🌐 Train & Evaluation on OATS

# Action-slot
python train_oats.py --dataset oats --oats_test_split s1 --model_name action_slot --epochs 50 --num_slots 35 --bg_slot --bg_mask --action_attn_weight 0.1 --allocated_slot --bg_attn_weight 0.1 --ego_loss_weight 0

python eval_oats.py --cp [path_to_checkpoint] --dataset oats --oats_test_split s3  --root [path_to_dataset] --model_name action_slot --allocated_slot --backbone x3d --num_slots 35 --bg_slot

🌐 Train & Evaluation on nuScenes

# train from scratch
python train_nuscenes.py --dataset nuscenes --root [path]/nuscenes/trainval/samples --model_name action_slot --num_slots 64 --bg_slot --bg_mask --action_attn_weight 1 --allocated_slot --bg_attn_weight 0.5 --bce_pos_weight 7

# transfer learning: TACO -> nuScenes
python train_nuscenes.py --pretrain taco --root [path]/nuscenes/trainval/samples --cp [path_to_checkpoint] --dataset nuscenes --model_name action_slot --num_slots 64 --bg_slot --bg_mask --action_attn_weight 1 --allocated_slot --bg_attn_weight 0.5 --bce_pos_weight 20 --root /media/hcis-s20/SRL/nuscenes/trainval/samples

# transfer learning: OATS -> nuScenes
python train_nuscenes.py --pretrain oats --root [path]/nuscenes/trainval/samples --cp [path_to_checkpoint] --dataset nuscenes --model_name action_slot--num_slots 64 --bg_slot --bg_mask --action_attn_weight 1 --allocated_slot --bg_attn_weight 0.5 --bce_pos_weight 15

📊 Attention Visualization

python eval_taco.py --cp [path_to_checkpoint] --plot --dataset taco --root [path_to_TACO] --model_name action_slot --num_slots 64 --bg_slot --allocated_slot --plot_threshold 0.2

Citation

@inproceedings{kung2023action,
  title={Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes},
  author={Kung, Chi-Hsi and Lu, Shu-Wei and Tsai, Yi-Hsuan and Chen, Yi-Ting},
  journal={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Contact

Chi-Hsi Kung - hank910140@gmail dot com

Acknowledgement

Slot attention is adapted from Discovering Object that Can Move
DeepLabV3+ is adapted from DeepLabV3Plus-Pytorch
VideoMAE is adapted from VideoMAE

HCIS-Lab/Action-slot