CDiscount-Image-Classification-Challenge: A Jupyter Notebook repository from Tejash-Shah

Overview

In this challenge, a model has been developed that will classify the products based on their images. The dataset which is provided has following characteristics:

9 million products
More than 15 million images having 180x180 resolution
More than 5000 categories

Dataset Description

train.bson (Size: 58.2 GB) - This is the main dataset which is provided to work on. It contains a list of 7,069,896 dictionaries, one per product. Each dictionary contains a product id, the category id of the product. Images are presented in the form of binary string which corresponds to binary representation of the image in JPEG format.
train_example.bson (616KB) – This is the dataset we are supposed to work on, initially. It contains the first 100 records of train.bson so that we can start exploring the data before moving on to train.bson
test.bson (Size: 14.53 GB) – This is the dataset on which prediction has to be made. It contains a list of 1,768,182 products in the same format as train.bson except there is not category id

Algorithms Implemented

Baseline model having 3 Convolution layer Deep learning model
VGG 16
VGG 19
ResNet50
Inception
Keras CNN model
VGG19 with real time data augmentation and convolution layer tuning

Dependencies

Pandas
Numpy
TensorFlow
Matplotlib
Pymongo
Keras
os
io
PIL

Tejash-Shah/CDiscount-Image-Classification-Challenge