Capstone Project : FusionBites

Project Overview

The project aims to automate the identification of images that align with the concept of "FusionBites", a new, unique food concept that blends sushi and sandwiches. This would involve analyzing a dataset composed of images tagged as either sushi or sandwiches.

Problem Statement

The challenge lies in creating an automated workflow or method capable of effectively and accurately identifying potential instances of "FusionBites" based on the given dataset. This task includes parsing through images labeled as sushi or sandwiches and determining whether they could potentially be classified as "FusionBites", a combination of the two.

Methodology

This project uses a Convolutional Neural Network (CNN) based approach to perform multilabel classification of images. Two types of architectures were experimented with:

Custom CNN: A simple custom-built CNN model with multiple Conv2D layers followed by MaxPooling2D layers. This model also incorporates Dropout layers for regularization.
MobileNetV2: A pre-trained MobileNetV2 model has been used as a base model with an additional Dense layer at the top to perform the multilabel classification.

The models were trained using the binary cross-entropy loss function, which is suitable for multilabel classification problems.

Images were preprocessed and augmented using ImageDataGenerator from Keras, which can generate batches of tensor image data with real-time data augmentation.

The data was split into training and testing sets, and the models were trained on the training set while validation was performed on the testing set.

Metrics

The performance of the models were evaluated using the following metrics:

Accuracy: This is the proportion of the total number of predictions that were correct. It is a useful measure when the target classes are balanced.
Loss (Binary Cross Entropy): Since this is a multilabel classification problem, binary cross-entropy loss was used. It measures the performance of a classification model whose output is a probability value between 0 and 1. The loss increases as the predicted probability diverges from the actual label.

The training process records these metrics for both training and validation data for each epoch, allowing for evaluation of how well the model is learning over time.

Note: Given the multilabel nature of the task, accuracy might not be the best metric. It might be beneficial to consider additional metrics like Precision, Recall, F1-score, or use a multi-label confusion matrix for a more detailed performance analysis.

Prediction Criteria

Post-training, the trained models are used to predict on the test set. For each prediction, the model outputs probabilities for each class. If the probability of a class is greater than a specified threshold, the image is considered as belonging to that class. Specifically for this project, an image is considered a "FusionBites" food item if the model's predicted probability is above the specified threshold for that class.

These predictions can be used to classify new images into their respective categories, enabling an automated categorization process for food images. The results of the predictions, including classified images and performance metrics, are saved in the results directory for further analysis and review.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

python 3.9+
pip

Installing

Clone the repository:

git clone https://github.com/yourusername/yourrepository.git

Install dependencies:

pip install -r requirements.txt

Configuration

Configuration for the model training and evaluation can be done via the config.json file. Here's a brief explanation of each item:

"download_link": This is the link to download the dataset. The model will use the images in this zip file for training and testing.
"train_dir" and "test_dir": These are the directories where train and test data respectively are stored after being downloaded and unzipped.
"model_output": This is the location where trained model will be saved.
"model": This is the selection of what model that we used for training (1 for CustomCNN and 2 for Fine Tuning MobileNetV2, please refer to src.models.neural_net.py file).
"test_size": This is represent of the number of test images used in model evaluation.
"epochs": This is the number of times the learning algorithm will work through the entire training dataset.
"batch_size": This is the number of training examples utilized in one iteration.
"thresh": This is the threshold for the output neuron activation, which determines whether a particular label (sushi or sandwich or both) should be activated. For example, in this case, any output value above 0.35 will be considered as an active label.
"seed": This is the seed for the random number generator. It is used to ensure that your experiments can be reproduced exactly.
"learning_rate": This is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated.

Data

Data will downloaded automatically when we first time run the training script.

Training

To train the model, run:

python run_train.py

Built With

Tensorflow - The deep learning framework used.
MobileNetV2 - The convolutional neural network architecture used.

Medium Post

Read more about this project and the concept behind it in this blog post on Medium

afifai/fusionbites-capstone-project