S-RAD Single Run Action Detector - A Privacy Preserving Approach [arXiV]

Introduction

Single Run Action Detector(S-RAD)is a real-time, privacy-preserving action detector that performs end-to-end action localization and classification. It is based on Faster-RCNN combined with temporal shift modeling and segment based sampling to capture the human actions. Results on UCF-Sports and UR Fall dataset present comparable accuracy to State-of-the-Art approaches with significantly lower model size and computation demand and the ability for real-time execution on edge embedded device (e.g. Nvidia Jetson Xavier). Our paper Pre-print can be found here

Overview

We release the Pytorch Code of S-RAD:

Modify the dataset path , log directory , model directory in the config file to the path you are using:
```
S-RAD/lib/model/utils/config.py
```
All the model and dataset related parameters can be updated here in the config file according to the dataset and model used.
The framelist of the two datasets are provided in the below path :
```
S-RAD/dataset_config/UCF_Sports/frames_list/
S-RAD/dataset_config/UR_falldataset/frame_list/
```
Frames lists are in the format videopath, #of frames, Class label
Change the path of the video in the framelist.txt files for all the dataset with the location that the dataset is stored in your environment
The annotations for UR-Fall dataset is derived from the COCO pretrained on mmdetection and we had provided the bounding box annotation in the following path:
```
 S-RAD/dataset_config/UR_falldataset/annotation/
 S-RAD/dataset_config/UCF_Sports/ucfsports-anno/
```

Train

Before training, set the right directory to save and load the trained models in S-RAD/lib/model/utils/config.py and modify the number of workers according to the batch size in the config file.

UCF-Sports:

To train on UCF-sport with resnet50 with 8 segment per clip, simply run:

python3 trainval_net.py --dataset ucfsport 
                        --net res50 
                        --bs 3 --lr 0.01 
                        --lr_decay_step 60 
                        --cuda --num_segments 8 
                        --acc_step 2  --s 16 
                        --epochs 300 --loss_type softmax 
                        --shift --shift_div 8 
                        --shift_place blockres 
                        --tune_from kinetics_resnet50.pth --pathway naive

where 'bs' is the batch size with default 1,'s' is the session number to differentiate the training session,'epochs' is the value the maximum epoch,loss type is sigmoid by default, acc_step is the accumulation step (gradient accumulation is implemented), 'tune_from' is the checkpoint of kinetics 400 to perform transfer learning, shift_div channels/feature maps proportion, num_segments is the num of frames, pathway is naive by default V100 GPU accomodated batch size of 3 (24 frames) at lr_rate 0.01 , lr_decay_step of 60 To obtain the result as reported in the paper freeze the first block of Resnet in RESNET.FIXED_BLOCKS_1 of config file at S-RAD/lib/model/utils/config.py

UR_Fall Dataset:

To train on UR_Fall dataset with resnet50 with 8 segment per clip, simply run:

python3 trainval_net.py --dataset urfall 
                        --net res50 
                        --bs 4 --lr 0.02 
                        --lr_decay_step 20 
                        --cuda --pathway naive 
                        --num_segments 8 --acc_step 3  
                        --s 12 --epochs 80 
                        --loss_type softmax 
                        --shift --shift_div 8 
                        --shift_place blockres 
                        --tune_from kinetics_resnet50.pth

Test

UCF-Sports:

If you want to evaluate the detection performance of a pre-trained res50 model on UCF sports test set, simply run

python3 trainval_net.py --dataset ucfsport 
                        --net res50 
                        --bs 3 --cuda 
                        --num_segments 8 
                        --loss_type softmax
                        --shift --shift_div 8
                        --shift_place blockres
                        --checkpoint 37 --checksession 45 
                        --checkepoch 3 --r True
                        --evaluate --eval_metrics
                        --pathway naive

Specify the specific model session, checkepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416 and set the num_of_workers = 0 in the config file.

UR-Fall :

If you want to evaluate the detection performance of a pre-trained res50 model on UR Fall test set, simply run

python3 trainval_net.py --dataset urfall 
                        --net res50 --bs 3 
                        --cuda --num_segments 8 
                        --loss_type softmax 
                        --shift --shift_div 8 
                        --shift_place blockres 
                        --checkpoint 13 --checksession 13 
                        --checkepoch 41 --r True 
                        --evaluate --eval_metrics 
                        --pathway naive

Results

UCF-Sports

State-of-the-Art per class frame mAP comparison in UCFSports

Action Class	Action_tubes	Learning_to_track	Multiregion	TCNN	S-RAD
Diving	75.79	60.71	96.12	84.37	99.90
Golf	69.29	77.54	80.46	90.79	87.20
Kicking	54.60	65.26	73.48	86.48	76.00
Lifting	99.09	100.00	99.17	99.76	99.96
Riding	89.59	99.53	97.56	100.0	99.90
Run	54.89	52.60	82.37	83.65	89.79
Skate Boarding	29.80	47.14	57.43	68.71	67.93
Swing1	88.70	88.87	83.64	65.75	88.78
Swing2	74.50	62.85	98.50	99.71	99.9
Walk	44.70	64.43	75.98	87.79	40.71

Overall frame mAP at IOU 0.5 threshold comparison in UCF-Sports Action dataset

Action_tubes	Learning to track	Multiregion	TCNN	ACTdetector	videocapsule_net	S-RAD
mAP	68.09	71.90	84.51	86.70	87.7	83.9

UR-Fall

State-of-the-Art per frame comparison in UR Fall dataset

Alaoui_et_al	Lu_et_Al	Cameiro_et_al	Leite_et_al
Sensitivity	100	-	100
Specificity	95	-	98.61
Accuracy	97.5	99.27	98.77

Pretrained-Models

Dataset	Checkpoint	log
UCF Sports	link	txt
UR-Fall	link	txt
Kinetics-400	link	-

Citation

If you did like to use S-RAD paper please use the3 followinf citation

@misc{saravanan2021single,
      title={Single Run Action Detector over Video Stream -- A Privacy Preserving Approach}, 
      author={Anbumalar Saravanan and Justin Sanchez and Hassan Ghasemzadeh and Aurelia Macabasco-O'Connell and Hamed Tabkhi},
      year={2021},
      eprint={2102.03391},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

We also recommend citing TemporalShiftModule and Faster RCNN, which inspired this work

Reference

The repository involves the usage of following methods:

TeCSAR-UNCC/S-RAD-ActionLocalizationClassification

S-RAD Single Run Action Detector - A Privacy Preserving Approach [arXiV]

Introduction

Overview

Table of Contents:

Preparation

Pre-Requisites

Compilation

Dataset Preparation

Train

UCF-Sports:

UR_Fall Dataset:

Test

UCF-Sports:

UR-Fall :

Results

UCF-Sports

State-of-the-Art per class frame mAP comparison in UCFSports

Overall frame mAP at IOU 0.5 threshold comparison in UCF-Sports Action dataset

UR-Fall

State-of-the-Art per frame comparison in UR Fall dataset

Pretrained-Models

Citation

Reference