Ambulance CCTV Detection

Created Using Python 3.7.10 and Tensorflow 2.4.1

Full Code

Installing required library

!pip install -r requirements.txt

Import Library

import os
import tensorflow as tf
import numpy as np

from PIL import Image
from skimage import transform
from tqdm import tqdm
from tensorflow.keras import models
from sklearn.utils import class_weight
from tensorflow.keras.models import load_model
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator

Dataset Preparation

We are using Open-Source Framework to download the selected categorized Dataset.

There are 4 categories that we used categories - (train, test, validation):

  1. Ambulance - (338, 51, 12)
  2. Bus - (1000, 247, 73)
  3. Car - (1000, 1000, 1000)
  4. Truck - (1000, 820, 269)

Framework : OIDv6.

Aquire the selected categorized Dataset and limiting maximum image categories to 1000 images.

!oidv6 downloader --dataset OIDv6/ --type_data all --classes Ambulance Bus Car Truck Van --limit 1000 --yes 

The Dataset will be Saved in the Following Structure

 โ”œโ”€โ”€ ๐Ÿ“‚test
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ambulance
 โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚labels
 โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ƒimage1_label.txt...image(n)_label.txt
 โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ–ผ๏ธimage1.jpg...image(n).jpg
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚bus
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚car
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚truck
 โ”œโ”€โ”€ ๐Ÿ“‚train
 โ”œโ”€โ”€ ๐Ÿ“‚validation

Dataset Preprocessing

To minimize Overfit and improving variance in the training data, we used ImageDataGenerator function to augment the train Dataset. The fuction respectively label each image based on their folder name.

train_datagen = ImageDataGenerator(
    rescale=(1/255.),              # normalize the image vector, by dividing with 255.
    width_shift_range=0.2,         # randomize shifting width in the range of 0.2
    height_shift_range=0.2,        # randomize shifting height in the range of 0.2
    zoom_range=0.2,                # randomize zoom in the range of 0.2
    shear_range=0.2,               # randomize shear in the range of 0.2
    rotation_range=20,             # randomize rotation in the range of 20 degree
    brightness_range=[0.8,1.2],    # randomize brightness in between 0.8 - 1.2
    horizontal_flip=True,          # randomly flipping the image
    #fill_mode="nearest"           # use fill mode if dataset background are plain color

    rescale=(1/255.)               # normalize the image vector, by dividing with 255.


traindir = "OIDv6/train"           # defining dataset directory
testdir = "OIDv6/test"
valtdir = "OIDv6/validation"

    target_size =(224, 224),       # rescale the image into 224 x 224 to be matched as model input scale
    class_mode='categorical',      # type of label arrays that are returned
    batch_size=32                  # make image batch size to 32, train step = totalImg/ batch

    target_size =(224, 224),

    target_size =(224, 224),

class_weights = class_weight.compute_class_weight(

Found 3338 images belonging to 5 classes.
Found 2243 images belonging to 5 classes.
Found 1399 images belonging to 5 classes.

{'.ipynb_checkpoints': 0, 'ambulance': 1, 'bus': 2, 'car': 3, 'truck': 4}

Plotting the Augmented image

target_labels = next(os.walk(traindir))[1]
batch = next(train_generator)
batch_images = np.array(batch[0])
batch_labels = np.array(batch[1])

target_labels = np.asarray(target_labels)

for n, i in enumerate(np.arange(10)):
    ax = plt.subplot(3,5,n+1)


Building the Model

For the model we use MobileNet V2 by transfer learning and fine-tuning to our dataset.

Defining Input Shape

IMG_SIZE = (224,224)
IMG_SHAPE = IMG_SIZE + (3,)    # Result shape (3, 224, 224)

Instantiate a MobileNet V2 model pre-loaded with weights trained on ImageNet. By specifying the include_top = False argument, it doesn't include the classification layers at the top, which is ideal for feature extraction.

base_model = tf.keras.applications.MobileNetV2(
                 input_shape = IMG_SHAPE,
                 include_top = False,
                 weights = 'imagenet'

Looking inside the base model looks like

Converts each image into a block of features.

image_batch, label_batch = next(iter(train_generator))
feature_batch = base_model(image_batch)

Freeze the convolutional layers

base_model.trainable = False

Adding Classification Head

model = tf.keras.Sequential([
  tf.keras.layers.Dense(256, activation='relu'),
  tf.keras.layers.Dense(5, activation='softmax')

Compiling the Model

base_learning_rate = 0.0001

Compiled Model After Freezing Model and Add Classification Head

Model: "sequential_1"
Layer (type)                 Output Shape              Param #   
mobilenetv2_1.00_224 (Functi (None, 7, 7, 1280)        2257984   
global_average_pooling2d_1 ( (None, 1280)              0         
dropout_2 (Dropout)          (None, 1280)              0         
dense_2 (Dense)              (None, 256)               327936    
dropout_3 (Dropout)          (None, 256)               0         
dense_3 (Dense)              (None, 5)                 1285      
Total params: 2,587,205
Trainable params: 329,221
Non-trainable params: 2,257,984

set 5 epoch to see the model initial accuracy

initial_epochs = 5

Train Model for 5 Epochs Before Fine-Tuning

history =
             epochs = initial_epochs,
             alidation_data = validation_generator
Model: "sequential_1"
Epoch 1/5
105/105 [==============================] - 247s 2s/step - loss: 1.5021 - accuracy: 0.3531 - val_loss: 0.6067 - val_accuracy: 0.8070
Epoch 2/5
105/105 [==============================] - 242s 2s/step - loss: 0.8688 - accuracy: 0.6671 - val_loss: 0.5626 - val_accuracy: 0.8063
Epoch 3/5
105/105 [==============================] - 240s 2s/step - loss: 0.8078 - accuracy: 0.6889 - val_loss: 0.4774 - val_accuracy: 0.8320
Epoch 4/5
105/105 [==============================] - 241s 2s/step - loss: 0.7713 - accuracy: 0.6911 - val_loss: 0.4868 - val_accuracy: 0.8234
Epoch 5/5
105/105 [==============================] - 243s 2s/step - loss: 0.7265 - accuracy: 0.7187 - val_loss: 0.4787 - val_accuracy: 0.8292


We were using the last 20% layer to be un-freeze for the model to get some features from our dataset, and minimizing overfit.

#Un-Freeze Top Layer
base_model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = 123            # Freeze first 80% from total 154 Layers

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable =  False

Compiled Model After Fine-Tuning

              optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule),
              metrics = ['accuracy']
Model: "sequential_1"
Layer (type)                 Output Shape              Param #   
mobilenetv2_1.00_224 (Functi (None, 7, 7, 1280)        2257984   
global_average_pooling2d_1 ( (None, 1280)              0         
dropout_2 (Dropout)          (None, 1280)              0         
dense_2 (Dense)              (None, 256)               327936    
dropout_3 (Dropout)          (None, 256)               0         
dense_3 (Dense)              (None, 5)                 1285      
Total params: 2,587,205
Trainable params: 1,947,781
Non-trainable params: 639,424

Defining Scheduled Learning-rate Decay

to minimize overfit even further we used Scheduled Learning-rate Decay.

# creating scheduled learning rate decay
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    base_learning_rate,     # Base is 0.0001
    decay_steps = 50,       # LR will decay every 50 step
    decay_rate = 0.9

Make Callback to Save Model Weights

checkpoint_path = "new_checkpoint/cp_rev_1.ckpt"         # Checkpoint Save Path
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(
                  filepath = checkpoint_path,
                  save_weights_only = True,
                  verbose = 1


We then train the fine-tuned model for another 25 epochs

fine_tune_epochs = 25
total_epochs =  initial_epochs + fine_tune_epochs

history_fine =,
                        epochs = total_epochs,
                        initial_epoch = history.epoch[-1],
                        validation_data = validation_generator,
                        callbacks = [cp_callback]
Plotting the training and validation accuracy and loss

acc = history_fine.history['accuracy']
val_acc = history_fine.history['val_accuracy']

loss = history_fine.history['loss']
val_loss = history_fine.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.ylim([0.5, 1])
          plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.ylim([0, 1.0])
         plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')


As we can see from the graph, the model is not suffering to much from overfit, we can see it has found its convergence (no significant growth in the graph), It's also have a good Training and Validation Accuracy after just 25 epochs.

Evaluate Test Accuracy

loss, accuracy = model.evaluate(test_generator)
print('Test accuracy :', accuracy)
71/71 [==============================] - 129s 2s/step - loss: 0.5047 - accuracy: 0.8235
Test accuracy : 0.8234507441520691

Saving the Model

Saving as TF format using

save_path = 'Model/TheATeam_model_ver2'

The exported model will be structered as follows

 โ”œโ”€โ”€ ๐Ÿ“‚TheATeam_model_ver2
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚Assets                    # Contains files used by the TensorFlow graph(not used now).
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚Variables                 # Contains a standard training checkpoint
 โ”‚ โ”œโ”€โ”€ ๐Ÿ“ƒsaved_model.pb            # The saved model

We also save the model as HDF5 format, using

file_name = 'TheATeam_model_ver2.h5' file_name,save_format='h5' )


To test the model, we proposed a video file (.mp4) classification and not using CCTV input stream yet.

Importing Required Library

import tensorflow as tf
import numpy as np
import cv2
import pytube
import os

from PIL import Image
from skimage import transform

Load the Model

my_model = tf.keras.models.load_model('Model/TheATeam_model_ver2', compile = True)

Make the compile argument True to compile the model after loading.

Infference Function

#5 Load and pre-process image frames
 def load_frames(frame):
     frames =
     frames = np.array(frames).astype('float32')/255
     frames = transform.resize(frames, (224, 224, 3))
     frames = np.expand_dims(frames, axis=0)
     return frames

 #1 get video
 vidcap = cv2.VideoCapture('../ambulance.mp4')

 #3 converting video into frame image (jpg format)
 def getFrame(sec):
     hasFrames,image =

     if hasFrames:
         # Specify frame path file
         framePath = "../video-frames/"+str(count)+"_frame.jpg"
         # save frame as JPG file
         cv2.imwrite(framePath, image)

         #4 Load and Predict Frame directly
         image = load_frames(framePath)
         result = my_model.predict(image)

         #6 Print ambulance detected or not and probability value
          predict_result = (str(count)+") Ambulance Detected: {}".format("%.3f" % result[0][1]) if result[0][1]>0.03 
              else str(count)+") Ambulance not detected: {}".format("%.3f" % result[0][1]))
     return hasFrames

 sec = 0
 frameRate = 5               # Capture Image in Second
 count = 1                   # Video Frame Count
 success = getFrame(sec)     # Initial Function to Get the Frame and Predict the Frame

 #2 Looping the function to get the frame and predict frame directly
 while success:
     count = count + 1
     sec = sec + frameRate
     sec = round(sec, 2)
     success = getFrame(sec)


Example Resulted Frame



Actual Moving Frames

The example video that we are using Link.


We are using Flask (a python framework) to deploy it in the server and serves as REST API. When the API is send, it will return predicted value as a JSON format.

Transfering from previous test code in flask, and add socket so it can be requested at any time.

def predict_process():
    # Code here
    #loading the model
    my_model = tf.keras.models.load_model('./TheATeam_model_ver2.h5', compile=True)

    #5 Load and pre-process image frames
    def load_frames(frame):
        frames =
        frames = np.array(frames).astype('float32')/255
        frames = transform.resize(frames, (224, 224, 3))
        frames = np.expand_dims(frames, axis=0)
        return frames

    #1 get video
    vidcap = cv2.VideoCapture('../ambulance.mp4')

    #3 converting video into frame image (jpg format)
    def getFrame(sec):
        hasFrames,image =

        if hasFrames:
            # Specify frame path file
            framePath = "../video-frames/"+str(count)+"_frame.jpg"
            # save frame as JPG file
            cv2.imwrite(framePath, image)

            #4 Load and Predict Frame directly
            image = load_frames(framePath)
            result = my_model.predict(image)

            #6 Return ambulance detected or not, probability value, and in which frame
            predict_result = {
                "ambulance_detected": 1 if result[0][1] > 0.03 else 0,
                "frame_number": count,
                "precentage": "{}".format("%.3f" % result[0][1])
            emit('predict_result', json.dumps(predict_result), broadcast=True)

        return hasFrames

    sec = 0
    frameRate = 5 # Capture image in second
    success = getFrame(sec) # Initial function to get the frame and predict frame

    # Looping the function to get the frame and predict frame directly
    while success:
        count = count + 1
        sec = sec + frameRate
        sec = round(sec, 2)
        success = getFrame(sec)
# Defining Socket that can be called by emit predict and see the result at the listener
def predict(data):
    emit('predict_result', 'Predict Start', broadcast=True)
    emit('predict_result', 'Predict End', broadcast=True)

Thanks to: