Project Resolution:

This project was successfully completed. Overall, the custom CNN reached over 91% F1-Score but the model was considerably large with 255MB. Several pre-trained models were tested. However, only MobileNet reached the goal of 91% F1-Score while keeping a model size applicable to smartphones (25MB).
A final deployment of the re-trained MobileNet model was performed in gradio and can be tested @: https://1a3f3ea116b9c5b00a.gradio.live

Objectives

MonReader is a new mobile document digitalization experience for the blind, for researchers and for everyone else in need for fully automatic, highly fast and high-quality document scanning in bulk. It is composed of a mobile app and all the user needs to do is flip pages and everything is handled by MonReader: it detects page flips from low-resolution camera preview and takes a high-resolution picture of the document, recognizing its corners and crops it accordingly, and it dewarps the cropped document to obtain a bird's eye view, sharpens the contrast between the text and the background and finally recognizes the text with formatting kept intact, being further corrected by MonReader's ML powered redactor.

Page flipping video from smart phones labelled as flipping and not flipping.
Videos were clipped as short videos and labelled as flipping or not flipping. The extracted frames are then saved to disk in a sequential order with the following naming structure: VideoID_FrameNumber

Using a custom CNN model, predict if the page is being flipped using a single image.
Using a pre-trained ResNet, VGG16, and MobileNet, predict if the page is being flipped using a single image.

Evaluate model performance based on F1 score, the higher the better but should be higher than 91%.
Model should also have a final size smaller than 40Mb so it can fit in a smartphone app.