Accident-detection

Detecting accidents/crashes in CCTV footage.

This repository contains 3D Residual Network code for my minor project coursework titled 'Accident Identification using Deep Learning'. Please find the detailed report along side this repository.

3D ResNet details

Let x denote the input clip of size 3×L×H×W, where L is the number of frames inthe input video, H and W are the video frame height and width, and 3 refers to the RGB channels. In this model we consider each block to consist of two convolutional layers with a ReLU activation function after each layer, without the bottleneck layers. Let z_i ,a_i be the tensors computed by the i^th convolutional block in the residual network and the activations obtained after applying ReLu function respectively The output of this i^th residual block can be represented as:

a^[i+2] = g^[i] (a^[i] + z^[i+2])

The tensor z_i in this case is 4D and has size N_i×L×H_i×W_i, where N_i is thenumber of filters used in the i^th block. Each filter is 4-dimensional and it has size N_i×t×d×d where t denotes the temporal extent of the filter and d denotes thespatial extent of the filter. The filters are then convolved in 3 dimensions i.e., overboth time and space dimensions. The outputs from these convloutional layers are aggregated to the bottom layer where global average pooling takes place over the entire spatio-temporal volume and the final classification prediction is addressed by a fully connected layer as seen in the figure below:


3D ResNet Architecture.

Dataset used

The model was trained using 615 video snippets scraped from the internet, with clip lengths ranging from 6 to 100 seconds. A subset of the original data-set is available for download here. To train the model, download the data set and place it in the root directory.

Running the code

Download the pretrained weight from here and place it in the directory of the notebook.
Run the model in inference mode from the notebook.

Hyperparameter values used

Hyperparameter	Value used
Batch size	16
Batch accumlation	2
Sampled frames	40 (20x2)
Learning rate	0.006
Epochs	50
No. of GPU's	2

Results

The following table summarises the validation results of the 3D ConvNets:

Model Architecture	Accuracy	F-score
R2plus1d_18	87.37	0.869
Mc3_18	85.35	0.856
R3d_18	84.11	0.847

To-do-list

Deploy an app for easier access.
Convert the notebook code into modular scripts.

Ashish013/Accident-detection