Awesome Temporal Action Segmentation

A curated list of awesome temporal action segmentation resources. Inspired by awesome-machine-learning-resources.

⭐ Please leave a STAR if you like this project! ⭐

The Task
Datasets
Evaluation
- Acc
- F1
- Edit
Core Techniques
Paper List

The Task

Temporal Action Segmentation takes as the input an untrimmed video sequence, segments it along the temporal dimension into clips and infers the semantics of actions in them.

Surveys & Overviews

ATLAS tutorial in conjuction with ECCV2022. [Tutorial] [Talk]
Temporal Action Segmentation: An Analysis of Modern Technique [pdf]
- Guodong Ding, Fadime Sener, and Angela Yao

Datasets

There are multiple datasets that have been used to benchmark the perfomance of the temporal action segmentation approaches. The most commonly adopted datasets are the as follows:

Breakfast

The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf]
- Hilde Kuehne, Ali Arslan, and Thomas Serre, CVPR 2014.

Breakfast Actions targets recording videos ''in the wild'', in 18 different kitchens, as opposed to the controlled lab environments in the previous datasets~\cite{fathi2011learning,stein2013combining}. The participants are not given any scrips and the recordings are unrehearsed and undirected. The dataset is composed of the 10 breakfast-related activities.
This dataset is recorded with 52 participants with multiple cameras, varies from 3 to 5, all from a third-person point of view. There are 1712 videos, when accounting for the multi-camera views.

GTEA

Learning to Recognize Objects in Egocentric Activities [pdf]
- Alireza Fatih, Xiaofeng Ren, and M James Rehg, CVPR 2011.

contains 28 videos recorded in a single kitchen from seven procedural activities. The videos are recorded with a camera mounted on a cap, worn by four participants.

50Salads

Combining Embedded Accelerometers with Computer Vision for Recognizing Food Preparation Activities [pdf]
- Sebastian Stein, and Stephen J Mckenna, UbiComp 2013.

is composed of 50 recorded videos of 25 participants making two different mixed salads. The videos are captured by a camera with a top-down view onto the work-surface. The participants are provided with recipe steps which are randomly sampled from a statistical recipe model.

YouTube Instructional

Unsupervised Learning from Narrated Instruction Videos [pdf]
- Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, and Simon Lacoste-Julien, CVPR 2016.

is a recently collected dataset where 53 participants were asked to dissemble and assemble take apart toys without given any instructions which resulted in realistic sequences with great variation in action ordering. The dataset is annotated with fine-grained, hand-object interactions, and coarse action labels which are composed of multiple fine-grained action segments related to the attaching or detaching of a vehicle part. The authors evaluated their dataset for temporal action segmentation using the coarse labels.

Assembly101

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities [pdf]
- Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and others, CVPR 2022.

Evaluation Measures

Acc

Accuracy or MoF (mean over frames) are an per-frame accuracy measure that calculates the ratio of frames that are correctly recognized by the temporal action model:

$\text{Acc}=\frac{\text{number of correct frames}}{\text{number of all frames}}$

F1 score

The F1-score, or F1@$\tau$ compares the Intersection over Union (IoU) of each segment with respect to the corre- sponding ground truth based on some threshold $\tau/100$. $\tau$ are set to ${10,25,50}$. A segment is considered a true positive if its score with respect to the ground truth exceeds the threshold. If there is more than one correct segment within the span of a single ground truth action, then only one is marked as a true positive and the others are marked as false positives.

$\text{F1} = 2 \cdot \frac{\text{precision}\times \text{recall}}{\text{precision} +\text{recall}}$.

Edit score

The Edit Score is computed using the Levenshtein distance $e$, which quantifies how similar two sequences are to each other by counting the minimum number of operations required to convert one (segment) string into another.

$\text{Edit} = \frac{1-e(X,Y)}{\text{max}(|X|,|Y|)} \cdot 100$

Paper List

Fully-Supervised

2022

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation, [pdf]
- Nadine Behrmann, S. Alireza Golestaneh, Zico Kolter, Juergen Gall, and Mehdi Noroozi, ECCV 2022.

2021

ASFormer: Transformer for Action Segmentation, [pdf] [code]
- Fangqiu Yi, Hongyu Wen, and Tingting Jiang, BMVC 2021.
Alleviating Over-segmentation Errors by Detecting Action Boundaries, [pdf] [code]
- Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, and Hirokatsu Kataoka, WACV 2021.
Coarse to Fine Multi-Resolution Temporal Convolutional Network, [pdf] [code]
- Dipika Singhania, Rahul Rahaman, and Angela Yao, Arxiv 2021.
FIFA: Fast Inference Approximation for Action Segmentation, [pdf]
- Yaser Souri, Yazan Abu Farha, Fabien Despinoy, Gianpiero Francesca, and Juergen Gall, GCPR 2021.
Global2Local: Efficient Structure Search for Video Action Segmentation, [pdf] [code]
- Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, and Ming-Ming Cheng, CVPR 2021.

2020

Action Segmentation with Mixed Temporal Domain Adaptation, [pdf]
- Min-Hung Chen, Baopu Li, Yingze Bao, and Ghassan Alregib, WACV 2020.
Boundary-Aware Cascade Networks for Temporal Action Segmentation, [pdf] [code]
- Zhenzhi Wang, Ziteng Gao, Limin Wang, Zhifeng Li, and Gangshan Wu, ECCV 2020.
Improving Action Segmentation via Graph Based Temporal Reasoning, [pdf]
- Yifei Huang, Yusuke Sugano, and Yoichi Sato, CVPR 2020.
Temporal Aggregate Representations for Long Term Video Understanding, [pdf]
- Fadime Sener, Dipika Singhania, and Angela Yao, ECCV 2020.

2019

MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation, [pdf] [code]
- Yazan Abu Farha, and Juergen Gall, CVPR 2019.

2018

Temporal Deformable Residual Networks for Action Segmentation in Videos, [pdf]
- Peng Lei, and Sinisa Todorovic, CVPR 2018.

2017

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, [pdf] [code]
- João Carreira, and Andrew Zisserman, CVPR 2017.
Temporal Convolutional Networks for Action Segmentation and Detection, [pdf] [code]
- Colin Lea, Michael D Flynn Ren, Austin Reiter, and Gregory D Hager, CVPR 2017.

2016

An end-to-end generative framework for video segmentation and recognition, [pdf]
- Hilde Kuehne, Juergen Gall, and Thomas Serre, WACV 2016.
Temporal Action Detection Using a Statistical Language Model, [pdf] [code]
- Alexander Richard, and Juergen Gall, CVPR 2016.

Weakly-Supervised

2022

A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation, [pdf] [code]
- Rahul Rahaman, Dipika Singhania, Alexandre Thiery, and Angela Yao, ECCV 2022.
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos, [pdf]
- Reza Ghoddoosian, Saif Sayed, and Vassilis Athitsos, WACV 2022.
Temporal Action Segmentation with High-level Complex Activity Labels, [pdf]
- Guodong Ding, and Angela Yao, Arxiv 2022.
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation, [pdf]
- Nadine Behrmann, S. Alireza Golestaneh, Zico Kolter, Juergen Gall, and Mehdi Noroozi, ECCV 2022.

2021

Anchor-Constrained Viterbi for Set-Supervised Action Segmentation, [pdf]
- Jun Li, and Sinisa Todorovic, CVPR 2021.
Learning Discriminative Prototypes with Dynamic Time Warping, [pdf]
- Xiaobin Chang, Frederick Tung, and Greg Mori, CVPR 2021.
Temporal Action Segmentation from Timestamp Supervision, [pdf] [code]
- Zhe Li, Yazan Abu Farha, and Juergen Gall, CVPR 2021.

2020

Fast Weakly Supervised Action Segmentation Using Mutual Consistency, [pdf]
- Yaser Souri, Mohsen Fayyaz, Luca Minciullo, Gianpiero Francesca, and Juergen Gall, ECCV 2020.
Procedure Completion by Learning from Partial Summaries, [pdf]
- Zwe Naing, and Ehsan Elhamifar, BMVC 2020.
SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation, [pdf] [code]
- Mohsen Fayyaz, and Juergen Gall, CVPR 2020.
Set-Constrained Viterbi for Set-Supervised Action Segmentation, [pdf]
- Jun Li, and Sinisa Todorovic, CVPR 2020.

2019

D3TW: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation, [pdf] [code]
- Chien Yi Chang, De An Huang, Yanan Sui, Fei-Fei Li, and Juan Carlos Niebles, CVPR 2019.
Weakly Supervised Energy-Based Learning for Action Segmentation, [pdf] [code]
- Jun Li, Peng Lei, and Sinisa Todorovic, ICCV 2019.

2018

A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation, [pdf]
- Hilde Kuehne, Alexander Richard, and Juergen Gall, TPAMI 2018.
Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints, [pdf] [code]
- Alexander Richard, Hilde Kuehne, and Juergen Gall, CVPR 2018.
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning, [pdf] [code]
- Alexander Richard, Hilde Kuehne, Ahsan Iqbal, and Juergen Gall, CVPR 2018.
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment, [pdf] [code]
- Li Ding, and Chenliang Xu, CVPR 2018.

2017

Weakly supervised action learning with RNN based fine-to-coarse modeling, [pdf] [code]
- Alexander Richard, Hilde Kuehne, and Juergen Gall, CVPR 2017.
Weakly supervised learning of actions from transcripts, [pdf]
- Hilde Kuehne, Alexander Richard, and Juergen Gall, CVIU 2017.

2016

Connectionist temporal modeling for weakly supervised action labeling, [pdf] [code]
- De An Huang, Fei-Fei Li, and Juan Carlos Niebles, ECCV 2016.

Unsupervised

2022

Fast and Unsupervised Action Boundary Detection for Action Segmentation, [pdf]
- Zexing Du, Xue Wang, Guoqing Zhou, and Qing Wang, CVPR 2022.

2021

Action Shuffle Alternating Learning for Unsupervised Action Segmentation, [pdf]
- Jun Li, and Sinisa Todorovic, CVPR 2021.
Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences, [pdf]
- Rosaura G Vidalmata, Walter J Scheirer, Anna Kukleva, David Cox, and Hilde Kuehne, WACV 2021.
Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation, [pdf] [code]
- M Saquib Sarfraz, Naila Murray, Vivek Sharma, Ali Diba, Luc Van Gool, and Rainer Stiefelhagen, CVPR 2021.

2020

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation, [pdf] [code]
- Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib, and Zsolt Kira, CVPR 2020.

2019

Unsupervised learning of action classes with continuous temporal embedding, [pdf] [code]
- Anna Kukleva, Hilde Kuehne, Fadime Sener, and Jurgen Gall, CVPR 2019.
Unsupervised procedure learning via joint dynamic summarization, [pdf]
- Ehsan Elhamifar, and Zwe Naing, ICCV 2019.

2018

Unsupervised Learning and Segmentation of Complex Activities from Video, [pdf] [code]
- Fadime Sener, and Angela Yao, CVPR 2018.

2016

Unsupervised learning from narrated instruction videos, [pdf]
- Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, and Simon Lacoste-Julien, CVPR 2016.

2015

Unsupervised Semantic Parsing of Video Collections, [pdf]
- Ozan Sener, Amir R. Zamir, Silvio Savarese, and Ashutosh Saxena, ICCV 2015.

Semi-Supervised

2022

Iterative Contrast-Classify for Semi-supervised Temporal Action Segmentation, [pdf] [code]
- Dipika Singhania, Rahul Rahaman, and Angela Yao, AAAI 2022.
Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation, [pdf] [code]
- Guodong Ding, and Angela Yao, ECCV 2022.

Citation

If you find this repository helpful in your research, please consider citing our survey as:

@article{ding2022temporal,
    title={Temporal Action Segmentation: An Analysis of Modern Technique},
    author={Ding, Guodong and Sener, Fadime and Yao, Angela},
    journal={arXiv preprint arXiv:2210.10352},
    year={2022}
}

Feedback

pull request

MJAHMADEE/awesome-temporal-action-segmentation

Awesome Temporal Action Segmentation

Table of Contents

The Task

Surveys & Overviews

Datasets

Breakfast

GTEA

50Salads

YouTube Instructional

Assembly101

Evaluation Measures

Acc

F1 score

Edit score

Paper List

Fully-Supervised

2022

2021

2020

2019

2018

2017

2016

Weakly-Supervised

2022

2021

2020

2019

2018

2017

2016

Unsupervised

2022

2021

2020

2019

2018

2016

2015

Semi-Supervised

2022

Citation

Feedback