Weed Species Classification and Bounding Box Regression

Leveraging advanced image processing and deep learning, this project focuses on CNNs and the Keras API for image processing and regression tasks related to plant images, particularly weed species from Plant Seedlings dataset"I worked on a subset". The project involves data preparation, basic transfer learning using the VGG-16 model, classification, and regression networks. Regularization methods are applied to improve the model, and discussions on overfitting and the impact of regularization are included. The submission requires a Jupyter file containing the solution, and late submissions are not allowed. It's contributes to understanding CNNs, transfer learning, and handling small training data. This project holds significance within my Master's in Computer Vision at uOttawa (2023).

Required libraries: scikit-learn, pandas, matplotlib.
Execute cells in a Jupyter Notebook environment.
The uploaded code has been executed and tested successfully within the Google Colab environment.

Image classification and bounding box regression using transfer learning with a VGG-16 model.

The dataset comprises 4 classes with 250 images each, divided into training,and testing sets, images size are differnet: Cleavers, Common Chickweed, Maize, Shepherd’s Purse,

Key Tasks Undertaken

Data Preparation:

Uploaded a subset of the dataset from Google Drive.
Extracted the dataset and organized it into 70% training, 15% validation, and 15% testing sets.
- Traning Set
- Validation Set
- Testing Set
Loaded the data, resized images to 32x32 pixels, and created DataFrames for each set.

   Training Data Size: 700
   Training Data Label Counts:
   Shepherds_Purse     175
   Common_Chickweed    175
   Cleavers            175
   Maize               175
   Name: Label, dtype: int64 
   
   Size of the Images in Training Data: (32, 32, 3)
   ----------------------------------------------------------------
   
   Validation Data Size: 148
   Validation Data Label Counts:
   Shepherds_Purse     37
   Common_Chickweed    37
   Cleavers            37
   Maize               37
   Name: Label, dtype: int64 
   
   Size of the Images in Validation Data: (32, 32, 3)
   ----------------------------------------------------------------
   
   Test Data Size: 152
   Test Data Label Counts:
   Shepherds_Purse     38
   Common_Chickweed    38
   Cleavers            38
   Maize               38
   Name: Label, dtype: int64
   Size of the Images in Test Data: (32, 32, 3)
   ----------------------------------------------------------------

Classification Network (Transfer Learning):

Used the first 2 blocks of VGG-16 model for transfer learning.
Modified the model by adding custom layers for classification.

   # Add custom layers
   x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
   x = MaxPooling2D((2, 2))(x)
   x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
   x = MaxPooling2D((2, 2))(x)
   x = Flatten()(x)
   outputs = Dense(4, activation='softmax')(x)  # Output layer for 4 classes
  
  # Create the custom model
  classification_model = Model(inputs=vgg_model.input, outputs=outputs)

One-hot encoded the labels.
Trained the classification model, monitored convergence, and visualized learning curves.

   batchSize = 64
   nEpochs = 100
   
   # Compile the model
   classification_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
   
   # Train the model
   history = classification_model.fit(X_train, y_train_k, batch_size=batchSize, epochs=nEpochs, verbose=1, validation_data=(X_valid, y_valid_k))

Plotted and analyzed the confusion matrix for training, validation, and testing datasets.

Regression Network (Transfer Learning):

Loaded bounding box dimensions from the .json file.
Normalized height and width values.

Split the data into 70% training, 15% validation, and 15% testing sets.

 Training Data Size: 700
 Training Data Label Counts:
 Shepherds_Purse     175
 Common_Chickweed    175
 Cleavers            175
 Maize               175
 Name: Label, dtype: int64 
 
 ----------------------------------------------------------------
 
 Validation Data Size: 148
 Validation Data Label Counts:
 Shepherds_Purse     37
 Common_Chickweed    37
 Cleavers            37
 Maize               37
 Name: Label, dtype: int64 
 
 ----------------------------------------------------------------
 
 Test Data Size: 152
 Test Data Label Counts:
 Shepherds_Purse     38
 Common_Chickweed    38
 Cleavers            38
 Maize               38
 Name: Label, dtype: int64

Used VGG-16 for transfer learning with custom layers for regression.

 # Add custom layers
 x_regression = Conv2D(256, (3, 3), activation='relu', padding='same')(x_regression)
 x_regression = MaxPooling2D((2, 2))(x_regression)
 x_regression = Conv2D(128, (3, 3), activation='relu', padding='same')(x_regression)
 x_regression = MaxPooling2D((2, 2))(x_regression)
 x_regression = Flatten()(x_regression)
 height_output = Dense(1, activation='linear', name='height')(x_regression)
 width_output = Dense(1, activation='linear', name='width')(x_regression)

# Create the custom regression model
regression_model = Model(inputs=regression_vgg_model.input, outputs=[height_output, width_output])

Trained the regression model, monitored convergence, and visualized learning curves.

   batchSize = 64
   nEpochs = 100

   # Compile the model
   regression_model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_squared_error'])
 
   # Train the regression model
   results =regression_model.fit(X_train_regression, [y_train_height, y_train_width], epochs=nEpochs, validation_data= 
   (X_valid_regression, [y_valid_height, y_valid_width]))

Calculated mean squared error and mean absolute error for training, validation, and testing datasets.

  22/22 [==============================] - 0s 4ms/step
 Mean Squared Error for height - Train: 0.002856253375326049, width - Train: 0.003164909554132075
 Mean Absolute Error for height - Train: 0.04336496062917911, width - Train: 0.04329592842347164
 
 5/5 [==============================] - 0s 4ms/step
 Mean Squared Error for height - Validation: 0.09055006138348325, width - Validation: 0.06981748160195345
 Mean Absolute Error for height - Validation: 0.2218270389548888, width - Validation: 0.20934197684151834
 
 5/5 [==============================] - 0s 4ms/step
 Mean Squared Error for height - Test: 0.07094346629570776, width - Test: 0.08139776182780212
 Mean Absolute Error for height - Test: 0.2150076942617718, width - Test: 0.22207330307667558

Model Improvement (Classification Network):

Modified the VGG-16 model by adding extra Keras layers and Introduced regularization techniques such as Batch
Normalization and Dropout.

 # Add custom layers with regularization
 x_new  = Conv2D(256, (3, 3), activation='relu', padding='same')(x_new )
 x_new  = BatchNormalization()(x_new )  # Batch Normalization layer
 x_new  = MaxPooling2D((2, 2))(x_new)
 x_new  = Conv2D(128, (3, 3), activation='relu', padding='same')(x_new)
 x_new  = BatchNormalization()(x_new)  # Batch Normalization layer
 x_new  = MaxPooling2D((2, 2))(x_new)
 x_new  = Flatten()(x_new)
 x_new  = Dropout(0.5)(x_new)  # Dropout layer with a dropout rate of 0.5
 outputs = Dense(4, activation='softmax')(x_new)  # Output layer for 4 classes
 
 # Create the model
 new_custom_model = Model(inputs=new_vgg_model.input, outputs=outputs)

Trained the improved classification model, monitored convergence, and visualized learning curves.

 batchSize = 35
 nEpochs = 100
 
 # Compile the model
 new_custom_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
 
 # Train the model
 newModel = new_custom_model.fit(X_train, y_train_k, batch_size=batchSize, epochs=nEpochs, verbose=1, validation_data=(X_valid, y_valid_k))

Plotted and analyzed the confusion matrix for training, validation, and testing datasets.
!

RimTouny/Weed-Species-Classification-and-Bounding-Box-Regression

Weed Species Classification and Bounding Box Regression

Image classification and bounding box regression using transfer learning with a VGG-16 model.

Key Tasks Undertaken