Continual Online Action Detection Transformer

This repository contains the Online Action Recognition Experiments from our work on Continual Transformers.

This repository is a fork of the official source code for the "OadTR: Online Action Detection with Transformers" (ICCV2021) ["Paper"], with model variations specified in different branches.

Set-up

Package Dependencies

Install the dependencies from the original OadTR project

pip install pytorch torchvision numpy json tensorboard-logger

Install Continual Transformer blocks

pip install --upgrade git+https://github.com/LukasHedegaard/continual-transformers.git

Pretrained features

Unzip the anno file "./data/anno_thumos.zip"
Download the features:
- THUMOS14-Anet feature
- THUMOS14-Kinetics feature
- TVSeries is available by contacting the authors of the datasets and signing agreements due to the copyrights. Following this guide, we extracted features using TSN ResNet-50 RGB and Flow models pretrained on ActivityNet and Kinetics.

When you have downloaded and placed the THUMOS featues under ~/data, you can select the features by appending the following to your python command:

ActivityNet (default):
- --feature Anet2016_feature_v2
Kinetics:
- --feature V3

Experiments

CoOadTR

From the main branch the CoOadTR model can be run with the following: command

python main.py --num_layers 1 --enc_layers 64 --cpe_factor 1

Here, num_layers denotes the number of transformer blocks (1 or 2), enc_layers is the sequence length, and cpe_factor is a multiplier for the number of unique circular positional embeddings (1>=x>=2).

OadTR ablations

Each conducted experiment has its own branch. An overview of the ablated features and associated results is found in the table below for the TSN-Anet features:

Encoder-layers	Decoder	Class-token	Circular encoding	mAP (%)	branch	command
3	✔︎	✔︎	-	57.8	original (baseline)	`python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64`
3	-	✔︎	-	56.8	no-decoder	`python main.py --num_layers 3 --enc_layers 64`
2	-	✔︎	-	55.6	no-decoder	`python main.py --num_layers 2 --enc_layers 64`
2	-	-	-	55.5	no-decoder-no-cls-token	`python main.py --num_layers 2 --enc_layers 64`
1	-	-	✔︎ (len n)	55.7	no-decoder-no-cls-token-shifting-tokens	`python main.py --num_layers 1 --enc_layers 64`
1	-	-	✔︎ (len 2n)	55.8	no-decoder-no-cls-token-shifting-tokens-2x	`python main.py --num_layers 1 --enc_layers 64`

THUMOS

Model	branch	command
OadTR	original	`python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --feature <FEATURE>`
OadTR-b2	no-decoder-no-cls-token	`python main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>`
OadTR-b2	no-decoder-no-cls-token	`python main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>`
CoOadTR-b2	main	`python main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>`
CoOadTR-b1	main	`python main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>`

Where <FEATURE> is either "anet" or "kin" for ActivityNet and Kinetics pretrained features, respectively. --dim_feature should be 3072 for "anet" , and 4096 for "kin".

TVSeries

Model	branch	command
OadTR	original-tvseries	`python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --feature <FEATURE>`
OadTR-b2	no-decoder-no-cls-token-tvseries	`python main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>`
OadTR-b2	no-decoder-no-cls-token-tvseries	`python main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>`
CoOadTR-b2	main	`python main.py --dataset tvseries --num_layers 2 --enc_layers 64 --feature <FEATURE>`
CoOadTR-b1	main	`python main.py --dataset tvseries --num_layers 1 --enc_layers 64 --feature <FEATURE>`

Where <FEATURE> is the name of your .pickle file of extracted features (either A.Net or Kin. features), placed in the ~/data folder.

LukasHedegaard/CoOadTR