SSIVD-Net: A Novel Salient Super Image Classification & Detection Technique for Weaponized Violence

INTRODUCTION

Our project focuses on the detection of violence and weaponized violence in CCTV footage using a comprehensive approach. We have introduced the Smart-City CCTV Violence Detection (SCVD) dataset, specifically designed to facilitate the learning of weapon distribution in surveillance videos. To address the complexities of analyzing 3D surveillance video, we propose a novel technique called SSIVD-Net (Salient-Super-Image for Violence Detection). Our method reduces data complexity and dimensionality while improving inference, performance, and explainability through the use of Salient-Super-Image representations. We also introduce the Salient-Classifier, a novel architecture that combines a kernelized approach with a residual learning strategy. Our approach outperforms state-of-the-art models in detecting both weaponized and non-weaponized violence instances. By advancing violence detection and contributing to the understanding of weapon distribution, our research enables smarter and more secure cities while enhancing public safety measures.

METHOD

Below is a table that shows the layer arrangements of Salient-Classifier architectures and their number of parameters:

Classifier	Layer Arrangement	Minimal Block(m)	Basic Block(b)	Bottle Neck(n)
SaliNet-2	1, 1, 0, 0	1.8	4.9	8.0
SaliNet-4	1, 1, 1, 1	1.8	4.9	8.0
SaliNet-8	2, 2, 2, 2	4.9	11.2	14.0
SaliNet-16	3, 4, 6, 3	10.0	21.3	23.5

RESULTS

Main Results

Eliminating parameters using the Salinet-2m variant:

k - grid_shape	Sampler	Aspect Ratio	Accuracy(%)	AP(%)	Inference time (s)
4 - 2x2	uniform	square	78.4	80.5	0.04
4 - 2x2	random	square	75.5	79.9	0.05
4 - 2x2	continuous	square	74.4	76.2	0.04
4 - 2x2	mean_abs	square	71.1	77.7	0.15
4 - 2x2	LK	square	69.6	78.2	0.21
4 - 2x2	centered	square	73.2	78.7	0.04
4 - 2x2	consecutive	square	70.4	79.4	0.04
6 - 3x2	uniform	144p_A	78.9	81.2	0.05
6 - 3x2	uniform	144p_B	79.7	81.9	0.05
6 - 3x2	uniform	240p_A	80.9	84.0	0.05
6 - 3x2	uniform	240p_B	81.3	84.2	0.05
6 - 3x2	uniform	360p_A	78.4	81.9	0.05
6 - 3x2	uniform	360p_B	82.4	83.8	0.05
6 - 3x2	uniform	480p_A	83.0	83.4	0.05
6 - 3x2	uniform	480p_B	83.0	83.4	0.06
9 - 3x3	uniform	square	84.7	85.0	0.06
12 - 4x3	uniform	480p_A	86.6	89.6	0.07
15 - 5x3	uniform	480p_A	84.3	86.8	0.08

Comparing our Salient-Classifers with SOTA:

Model	Num_Params (M)	Accuracy (%)
FGN	0.3	74.4
Conv-LSTM	47.4	71.6
Sep-Conv-LSTM	0.4	78.4
SaliNet-2m	1.8	86.6
SaliNet-4m	1.8	83.1
SaliNet-8m	4.9	77.8
SaliNet-2b	4.9	75.9
SaliNet-2n	8.0	78.8

Comparing our Salient-Classifiers with SOTA on other datasets:

Method	Model	MovieFight	HockeyFight	SCVD
	C3D	100.0	96.5	82.8
3D-CNNs	I3D	100.0	98.5	85.8
	FGN	100.0	98.0	87.3
	Conv-LSTM	100.0	97.1	77.0
Conv-LSTM	Bi-Conv-LSTM	100.0	98.1	-
	Sep-Conv-LSTM	100.0	99.5	89.3
	SaliNet-2m	100.0	100.0	88.5
Salient-Classifiers	SaliNet-2b	100.0	100.0	89.7
	SaliNet-2n	100.0	100.0	90.3

USAGE

ENVIRONMENT SETUP

Libraries:

Pytorch
Numpy
OpenCV
tqdm

TRAINING

In the main.py file, edit the parameters to match the task you would use it for.
Ensure that the video dataset are arranged accordingly, just like the structure below.
- VideoDataset
  - Train
    - Class A
    - Class B
  - Test
    - Class A
    - Class B
Go to the Scripts/ssi.py file, and edit the class names.
run python main.py

NOTE

For the updated paper, link
For the dataset, download from here. A preprocessed version can be downloaded here. If you use our dataset or code, please cite our paper and like our repository.

BIB: @InProceedings{
         10.1007/978-3-031-62269-4_2,
         author="Aremu, Toluwani
         and Zhiyuan, Li
         and Alameeri, Reem
         and Khan, Mustaqeem
         and Saddik, Abdulmotaleb El",
         editor="Arai, Kohei",
         title="SSIVD-Net: A Novel Salient Super Image Classification and Detection Technique for Weaponized Violence",
         booktitle="Intelligent Computing",
         year="2024",
         publisher="Springer Nature Switzerland",
         address="Cham",
         pages="16--35",
         isbn="978-3-031-62269-4"
}

Springer Nature: Aremu, T., Zhiyuan, L., Alameeri, R., Khan, M., Saddik, A.E. (2024). SSIVD-Net: A Novel Salient Super
Image Classification and Detection Technique for Weaponized Violence. In: Arai, K. (eds) Intelligent Computing. SAI 2024.
Lecture Notes in Networks and Systems, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-62269-4_2.

APA: Aremu, T., Zhiyuan, L., Alameeri, R., Khan, M., & Saddik, A. E. (2024, June). SSIVD-Net: A Novel Salient Super Image
Classification and Detection Technique for Weaponized Violence. In Science and Information Conference (pp. 16-35).
Cham: Springer Nature Switzerland.

lzyuan168/Violence_Detection