Object Detection and Localization with TensorFlow and MobileNetV2

This project demonstrates how to train an object detection model using TensorFlow and MobileNetV2 on the Caltech Birds 2010 dataset. The model detects bounding boxes around bird species in images and evaluates its performance using Intersection over Union (IoU).

Kaggle Notebook: https://www.kaggle.com/code/cheesecke/object-detection-and-localization/

Overview
Installation
Dataset
Model Architecture
Training
Evaluation
Visualization

Overview

Object detection is an essential task in computer vision, allowing machines to identify and localize multiple objects within an image. This project utilizes MobileNetV2 as a feature extractor and trains a model to predict bounding boxes around bird species in images from the Caltech Birds 2010 dataset. It includes utilities for visualization, model training, evaluation metrics, and more.

Dataset

The Caltech Birds 2010 dataset contains images of 200 bird species. It is divided into training and test sets, with bounding box annotations for each image.

Model Architecture

The model architecture consists of:

Feature Extractor: MobileNetV2 pre-trained on ImageNet to extract features from input images.
Dense Layers: Global Average Pooling and dense layers for feature processing.
Bounding Box Regression: Dense layer predicting bounding box coordinates.

Training

The model is trained using TensorFlow's SGD optimizer with MSE loss. Training involves iterating over batches of preprocessed images and their corresponding bounding box annotations.

Training Sample

Evaluation

Model performance is evaluated using Intersection over Union (IoU), comparing predicted bounding boxes with ground truth annotations. Loss metrics and validation curves are plotted to assess training progress.

Evaluation Sample

Visualization

Visualization utilities are provided to display images with predicted and ground truth bounding boxes, highlighting IoU scores for each prediction.

kunalarora0930/object-etection-and-localization