This project demonstrates how to train an object detection model using TensorFlow and MobileNetV2 on the Caltech Birds 2010 dataset. The model detects bounding boxes around bird species in images and evaluates its performance using Intersection over Union (IoU).
Kaggle Notebook: https://www.kaggle.com/code/cheesecke/object-detection-and-localization/
Object detection is an essential task in computer vision, allowing machines to identify and localize multiple objects within an image. This project utilizes MobileNetV2 as a feature extractor and trains a model to predict bounding boxes around bird species in images from the Caltech Birds 2010 dataset. It includes utilities for visualization, model training, evaluation metrics, and more.
The Caltech Birds 2010 dataset contains images of 200 bird species. It is divided into training and test sets, with bounding box annotations for each image.
The model architecture consists of:
-
Feature Extractor: MobileNetV2 pre-trained on ImageNet to extract features from input images.
-
Dense Layers: Global Average Pooling and dense layers for feature processing.
-
Bounding Box Regression: Dense layer predicting bounding box coordinates.
The model is trained using TensorFlow's SGD optimizer with MSE loss. Training involves iterating over batches of preprocessed images and their corresponding bounding box annotations.
Model performance is evaluated using Intersection over Union (IoU), comparing predicted bounding boxes with ground truth annotations. Loss metrics and validation curves are plotted to assess training progress.
Visualization utilities are provided to display images with predicted and ground truth bounding boxes, highlighting IoU scores for each prediction.