We are using a modified Resnet50 model to detect cracks in concrete structures. Larger images will be cut down into equal patches. Image as well as video can be given as input. However, there might be some lag when we use video.
Dataset: https://www.kaggle.com/datasets/xinzone/surface-crack
Training Data - split into 2 classes: Positive and Negative with 300 images in each of size 224x224
Test Data - 2 classes: Positive and negative with 100 images in each of size 224x224
Validation Data - 2 classes: Positive and negative with 100 images in each of size 224x224
Predict Set - 6 images of size 4800x3200
Due to the small number of images in both classes, we're performing data augmentation.
Methods:
-
rotation_range = 3
-
vertical_flip
-
horizontal_flip
-
brightness_range = (0.5, 1.2)
After Augmentation:
Training Data: augmented to 23,907 images in each class
Test Data: augmented to 7232 images in each class
Validation Data: augmented to 7248 images in each class
Model Architecture:
Model: Resnet50(Modified)
Subsequent layers: Dense layer of size 2048, Dense layer of size 128, softmax layer for 2 classes Optimizer: Adam - learning rate 0.001
Metrics for the trained modelSample training images:
Sample prediction with image: