/Tech-Test-ML-Hand

Image semantic hand segmentation using U-Net architecture; AItaca tech-test.

Primary LanguageJupyter Notebook

Image Hand Segmentation

Company Logo

About the Project

This project started on 26/09/2023 and was completed within 2 weeks on 05/10/2023 for AItaca Tech-Test-ML-Hand.

Primary Objective:

  • Develop a segmentation model that removes hands from the background with high precision.

Secondary Objectives:

  • Implement a web-based application using Streamlit, using user-input hand images and segmented images as the output.

Challenges and Project Development

  • Data Cleaning:
    • Created two folders named 'original' and 'no_bg'. Each hand image in these folders has a unique ID which is the original folder name appended with its index ([:10]).
    • Examined image sizes and ratios. Chose ratios [1.87, 1.85] to resize images to dimensions (2160, 4000) for the model.
    • Manually reviewed 'no_bg' images, discarding those that did not display accurate masks (like other body parts or the background).
    • The final dataset consisted of 79 unique original and 'no_bg' images. 'No_bg' images were transformed into true masks for predictions.
  • Data Split:
    • Initially, the dataset was split in an 80/10/10 ratio for training, validation, and testing respectively. Used Data Augmentation to produce 3 unique new images for training.
    • For subsequent models post the MVP, adopted a 65/15/10 split. Enhanced Data Augmentation to generate 5 unique new images for training and allocated more images for validation.
  • Data Augmentation: Parameters used:
    • Brightness Range: 0.5, 1.5
    • Zoom Range: 0.2
    • Rotation Range: 30
    • Width & Height shift Range: 0.1
    • Horizontal Flip: True
    • Shear Range: 0.1
  • Image Pre-Processing: Tried multiple pre-processing techniques, unprocessed images were eventually used as they yielded better performance.
    • Normalization: Pixel Normalization
    • Noise reduction: Bilateral Filtering
    • Histogram Equalization: CLAHE
    • Edge detection: Canny Edges
    • Morphological Operations: Dilatation

Results

MVP (Minimum Viable Product):

image

Model: Eva VIII:

image

Model: Eva X:

Screenshot 2023-10-04 at 19 54 09

Notebooks

Conclusions:

  • The MPV was done and finished in the first week, on 29/09/2023, showing some promising results.
  • Pre-processing the images did not improve performance.
  • Succeeding models were architecturally more intricate, with a limit of 128 filters for optimum performance. Incorporating dropout and residual layers was beneficial for generating prediction masks.
  • While there was an intent to experiment with various transfer learning models, time constraints prevented this. Another potential enhancement considered was using object detection before segmentation and fine-tuning some hyperparameters.

Toolkit:

  • JupyterLab: Enviorment for Python scripts and managing files.

Libraries

📚 Basic Libraries

  • Numpy: Image numeric array manipulation
  • Os: File access.
  • Matplotlib: Visualization.
  • Shutil: Folder operations (copying, deleting...).
  • Random: To generate random subsets of data.
  • Warnings: Roses are red, violets are blue --> Warnings are annoying.

🌐 Computer Vision

  • TensorFlow: Machine Learning for Computer Vision.
  • Keras: High-level neural networks API for Deep Learning, running on top of TensorFlow.
  • ImageDataGenerator: To generate random data augmentation (flips, zoom...).

🐍 About Neural Networks, Deep Learning and Image Segmentation

The architecture I've implemented is a variant of the U-Net model, a popular architecture for semantic image segmentation:

# First Downsampling Block
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.2)(x)
residual_1 = x
x = layers.MaxPooling2D((2, 2))(x)

# Second Downsampling Block
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
residual_2 = x
x = layers.MaxPooling2D((2, 2))(x)

# Bottleneck
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.4)(x)

# Upsampling
x = layers.Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.3)(x)
x = layers.add([x, residual_2])

x = layers.Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.2)(x)
x = layers.add([x, residual_1])

image

  • Conv2D Layer: Convolutional layer of the network, where the image pre-processing happens and the kernel filter the image on the whole image, detecting lines and edges. 32, 64, 128 refers to the umber of filters used, starting from a small number. Relu activation adds non-linearity to the model, to learn more complex patterns. Padding='same ensures that the output feature map has the same width and height as the input. Batch normalization is applied after each convolution to stabilize and speed up training.
  • MaxPooling2D Layer: It performs downsampling operations, and also spatial dimensions (width, height), reducing overfitting and the computional cost by decreasing the spatial dimensionality.
  • Dropout Layer: Helps in preventing overfitting by randomly setting a fraction of input units to 0 at each update during training.
  • Bottleneck: The central part of the U-Net which doesn't have a skip connection to capture the most abstract features of the image.
  • Conv2DTranspose Layer & Residual Connections: The transpose layers, often referred to as deconvolutional layers, help in upsampling the features. Residual connections from the downsampling path are added to these upsampled features. This is crucial as these connections help in localizing and refining the segmented regions.
  • Output Layer: Produces the segmented image. The sigmoid activation function ensures pixel values are in the [0,1] range.

Bibliography: