Keras AlexNet: Dog vs. Cat Classification

I want to build a simple Deep Learning model for image classification on Kaggle Dog vs. Cat Dataset. In this project, I decided to use AlexNet architecture as it repeatedly mention during my Machine Learning course. This project is simple enough that helps me understand Alexnet, familiarize with Keras, and gain more experience in ML field.

Dataset

sample img

After download and extract dataset from zip file, let's view the data.

The dataset doesn't come with a label file. But I can extract the label from image name in train dataset.

I apply data generator to provide variety to our train dataset which definitely will improve the model accuracy. This also "replicate" real world dataset because not all input image will be a perfect picture of a dog or a cat. Let's view a sample from our generator.

The train dataset split as 80% training and 20% validation with image generator applied to both. Data now ready to be train.

Deep Learning Model

As mentioned, I will be using AlexNet architecture to build the model. AlexNet consist of five convolutional layers, some followed by maximum pooling layers and then three fully connected layers. Since the dataset only consist of two classes (Dog and Cat), the last layer is a 2-ways softwax.

Layer name Output Filters Kernel size Stride Padding
Input 227x227x3 - - - -
Convol_1 55x55x96 96 11x11 4 valid
MaxPool_1 27x27x96 - 3x3 2 valid
Norm_1 27x27x96 - - - -
Convol_2 27x27x256 256 5x5 1 valid
MaxPool_2 13x13x256 - 3x3 2 valid
Norm_2 13x13x256 - - - -
Convol_3 13x13x384 384 3x3 1 valid
Convol_4 13x13x384 384 3x3 1 valid
Convol_5 13x13x256 256 3x3 1 valid
MaxPool_3 6x6x256 - 3x3 2 valid
FulConnect_1 4096 - - - -
FulConnect_2 4096 - - - -
FulConnect_3 1000 - - - -
FulConnect_4 2 (DogvCat) - - - -

Training

I am using a Huaweii Matebook Pro with 8th Gen Intel- i7, 16GB RAM, NVIDIA GeForce MX150. Definitely not a good laptop to run any type of machine learning project so each epochs take me roughly 10-15 minutes. I decided to use small epochs but reasonable enough to get decent result. I tried out with 3, then 10, and finally 20 epochs. If you have a stronger hardware, increase to 50 or so definitely will yield a good result.

Let's graph the train lost, train accuracy, validation lost, and validation accuracy for 20 epochs.

Result

Let's put some predicted result with images so we can see our prediction result better. I will do first 20 images from test result.

result img

TADA!!! I now have a simple model to classify picture of dog or cat.