/CoOadTR

Ablation study for "OadTR: Online Action Detection with Transformers".

Primary LanguagePythonMIT LicenseMIT

Continual Online Action Detection Transformer

This repository contains the Online Action Recognition Experiments from our work on Continual Transformers.

This repository is a fork of the official source code for the "OadTR: Online Action Detection with Transformers" (ICCV2021) ["Paper"], with model variations specified in different branches.

Set-up

Package Dependencies

Install the dependencies from the original OadTR project

pip install pytorch torchvision numpy json tensorboard-logger

Install Continual Transformer blocks

pip install --upgrade git+https://github.com/LukasHedegaard/continual-transformers.git

Pretrained features

When you have downloaded and placed the THUMOS featues under ~/data, you can select the features by appending the following to your python command:

  • ActivityNet (default):
    • --feature Anet2016_feature_v2
  • Kinetics:
    • --feature V3

Experiments

CoOadTR

From the main branch the CoOadTR model can be run with the following: command

python main.py --num_layers 1 --enc_layers 64 --cpe_factor 1

Here, num_layers denotes the number of transformer blocks (1 or 2), enc_layers is the sequence length, and cpe_factor is a multiplier for the number of unique circular positional embeddings (1>=x>=2).

OadTR ablations

Each conducted experiment has its own branch. An overview of the ablated features and associated results is found in the table below for the TSN-Anet features:

Encoder-layers Decoder Class-token Circular encoding mAP (%) branch command
3 ✔︎ ✔︎ - 57.8 original (baseline) python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64
3 - ✔︎ - 56.8 no-decoder python main.py --num_layers 3 --enc_layers 64
2 - ✔︎ - 55.6 no-decoder python main.py --num_layers 2 --enc_layers 64
2 - - - 55.5 no-decoder-no-cls-token python main.py --num_layers 2 --enc_layers 64
1 - - ✔︎ (len n) 55.7 no-decoder-no-cls-token-shifting-tokens python main.py --num_layers 1 --enc_layers 64
1 - - ✔︎ (len 2n) 55.8 no-decoder-no-cls-token-shifting-tokens-2x python main.py --num_layers 1 --enc_layers 64

THUMOS

Model branch command
OadTR original python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --feature <FEATURE>
OadTR-b2 no-decoder-no-cls-token python main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>
OadTR-b2 no-decoder-no-cls-token python main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>
CoOadTR-b2 main python main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>
CoOadTR-b1 main python main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>

Where <FEATURE> is either "anet" or "kin" for ActivityNet and Kinetics pretrained features, respectively. --dim_feature should be 3072 for "anet" , and 4096 for "kin".

TVSeries

Model branch command
OadTR original-tvseries python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --feature <FEATURE>
OadTR-b2 no-decoder-no-cls-token-tvseries python main.py --num_layers 2 --enc_layers 64 --feature <FEATURE>
OadTR-b2 no-decoder-no-cls-token-tvseries python main.py --num_layers 1 --enc_layers 64 --feature <FEATURE>
CoOadTR-b2 main python main.py --dataset tvseries --num_layers 2 --enc_layers 64 --feature <FEATURE>
CoOadTR-b1 main python main.py --dataset tvseries --num_layers 1 --enc_layers 64 --feature <FEATURE>

Where <FEATURE> is the name of your .pickle file of extracted features (either A.Net or Kin. features), placed in the ~/data folder.