[🌟CVPR2024 Highlight🌟]ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
This repository contains the official PyTorch implementation of the paper "ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More".
If you find this paper useful, please consider staring 🌟 this repository and citing 📑 our paper:
@inproceedings{zhou2024exact,
title={ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More},
author={Zhou, Jiazhou and Zheng, Xu and Lyu, Yuanhuiyi and Wang, Lin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18633--18643},
year={2024}
}
- Refer to install.md for step-by-step guidance on how to install the packages.
- Download the ViT-B-16 CLIP pretrained backbone in this repository.
- Download the evaluated dataset and its corresponding model checkpoints in the following SeAct Dataset section and Model Checkpoints section, respectively. Note that the train-val dataset split are provided in ./ExACT/Dataloader folder to ensure the fairness of future comparison.
- Preprocess the datasets in order to transform the raw events into the event frames by our proposed 🌟AFE representation🌟. Also remember to change your dataset directory path in dataset_name.py.
python ./ExACT/Dataloader/AFE Preprocessing/dataset_name.py
- Change the dataset_name.yaml file in the Configs folder, namely modifying config['MODEL']['Load_Path'], config['MODEL']['BACKBONE']['PRE_trained_model'], config['Dataset']['Train']['Path'], config['Dataset']['Val']['Path'], and config['Dataset']['Classnames'].
- Finally, evaluate the ExACT using the following command!
python ./ExACT/evaluate_dp_dataset_name.py
SeAct Dataset: Event action dataset with caption-level labels We propose the semantic-abundant SeAct dataset for event-text action recognition, where the detailed caption-level label of each action is provided. SeAct is collected with a DAVIS346 event camera whose resolution is 346 × 260. It contains 58 actions under four themes, as presented in the following images. Each action is accompanied by an action caption of less than 30 words generated by GPT-4 to enrich the semantic space of the original action labels. We split 80% and 20% of each category for training and testing (validating), respectively.
The following table provides the Download Access to our SeAct dataset, as well as PAF, DVS128Gesture and HARDVS datasets utilized for comparision. Please refer to the .txt files in the Dataloader folder for the dataset structure.
Event Datasets | Access to Download Datasets |
---|---|
SeAct | Download |
PAF | Download |
DVS128Gesture | Download |
HARDVS | Download |
Datasets | Access to Download Model checkpoints |
---|---|
SeAct | Download Model checkpoints |
PAF | Download Model checkpoints |
DVS128Gesture | Download Model checkpoints |
HARDVS | Download Model checkpoints |
We thank the authors of EventBind, MAP for opening source their wonderful works.
This repository is released under the MIT License.
If you have any questions about this project, please open an issue in this repository.