Detecting accidents/crashes in CCTV footage.
This repository contains 3D Residual Network code for my minor project coursework titled 'Accident Identification using Deep Learning'. Please find the detailed report along side this repository.
Let x denote the input clip of size 3×L×H×W, where L is the number of frames inthe input video, H and W are the video frame height and width, and 3 refers to the RGB channels. In this model we consider each block to consist of two convolutional layers with a ReLU activation function after each layer, without the bottleneck layers. Let zi ,ai be the tensors computed by the ith convolutional block in the residual network and the activations obtained after applying ReLu function respectively The output of this ith residual block can be represented as:
a[i+2] = g[i] (a[i] + z[i+2])
The tensor zi in this case is 4D and has size Ni×L×Hi×Wi, where Ni is thenumber of filters used in the ith block. Each filter is 4-dimensional and it has size Ni×t×d×d where t denotes the temporal extent of the filter and d denotes thespatial extent of the filter. The filters are then convolved in 3 dimensions i.e., overboth time and space dimensions. The outputs from these convloutional layers are aggregated to the bottom layer where global average pooling takes place over the entire spatio-temporal volume and the final classification prediction is addressed by a fully connected layer as seen in the figure below:
3D ResNet Architecture. |
The model was trained using 615 video snippets scraped from the internet, with clip lengths ranging from 6 to 100 seconds. A subset of the original data-set is available for download here. To train the model, download the data set and place it in the root directory.
- Download the pretrained weight from here and place it in the directory of the notebook.
- Run the model in inference mode from the notebook.
Hyperparameter | Value used |
---|---|
Batch size | 16 |
Batch accumlation | 2 |
Sampled frames | 40 (20x2) |
Learning rate | 0.006 |
Epochs | 50 |
No. of GPU's | 2 |
The following table summarises the validation results of the 3D ConvNets:
Model Architecture | Accuracy | F-score |
---|---|---|
R2plus1d_18 | 87.37 | 0.869 |
Mc3_18 | 85.35 | 0.856 |
R3d_18 | 84.11 | 0.847 |
- Deploy an app for easier access.
- Convert the notebook code into modular scripts.