Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both modalities remain scarce. In this paper, we introduce WEAR, an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). The dataset comprises data from 18 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 10 different outside locations. Unlike previous egocentric datasets, WEAR provides a challenging prediction scenario marked by purposely introduced activity variations as well as an overall small information overlap across modalities. Provided benchmark results reveal that single-modality architectures each have different strengths and weaknesses in their prediction performance. Further, in light of the recent success of transformer-based temporal action localization models, we demonstrate their versatility by applying them in a plain fashion using vision, inertial and combined (vision + inertial) features as input. Results demonstrate both the applicability of vision-based transformers for inertial data and fusing both modalities by means of simple concatenation, with the combined approach (vision + inertial features) being able to produce the highest mean average precision and close-to-best F1-score. The code to reproduce experiments is publicly available here. An arXiv version of our paper is available here.
- 14/06/2023: updated code base and arXiv available.
- 18/04/2023: provided code to reproduce experiments.
- 12/04/2023: initial commit and arXiv uploaded.
Please follow instructions mentioned in the INSTALL.md file.
The full dataset can be downloaded here
The download folder is divided into 3 subdirectories
- annotations (> 1MB): JSON-files containing annotations per-subject using the THUMOS14-style
- processed (15GB): precomputed I3D, inertial and combined per-subject features
- raw (130GB): Raw, per-subject video and inertial data
Once having installed requirements, one can rerun experiments by running the main.py
script:
python main.py --config ./configs/60_frames_30_stride/actionformer_combined.yaml --seed 1 --eval_type split
Each config file represents one type of experiment. Each experiment was run three times using three different random seeds (i.e. 1, 2, 3
). To rerun the experiments without changing anything about the config files, please place the complete dataset download into a folder called data/wear
in the main directory of the repository.
Please follow instructions mentioned in the README.md file in the postprocessing subfolder.
In order to log experiments to Neptune.ai please provide project
and api_token
information in your local deployment (see lines 34-35
in main.py
)
Please follow instructions mentioned in the README.md file in the data creation subfolder.
WEAR is offered under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You are free to use, copy, and redistribute the material for non-commercial purposes provided you give appropriate credit, provide a link to the license, and indicate if changes were made. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. You may not use the material for commercial purposes.
Marius Bock (marius.bock@uni-siegen.de)
@article{bock2023wear,
title={WEAR: An Outdoor Sports for Wearable and Egocentric Video Activity Recognition},
author={Bock, Marius and Kuehne, Hilde and Van Laerhoven, Kristof and Moeller, Michael},
volume={abs/2304.05088},
journal={CoRR},
year={2023},
url={https://arxiv.org/abs/2304.05088}
}