- This project was successfully completed. Overall, the custom CNN reached over 91% F1-Score but the model was considerably large with 255MB. Several pre-trained models were tested. However, only MobileNet reached the goal of 91% F1-Score while keeping a model size applicable to smartphones (25MB).
- A final deployment of the re-trained MobileNet model was performed in gradio and can be tested @: https://1a3f3ea116b9c5b00a.gradio.live
- MonReader is a new mobile document digitalization experience for the blind, for researchers and for everyone else in need for fully automatic, highly fast and high-quality document scanning in bulk. It is composed of a mobile app and all the user needs to do is flip pages and everything is handled by MonReader: it detects page flips from low-resolution camera preview and takes a high-resolution picture of the document, recognizing its corners and crops it accordingly, and it dewarps the cropped document to obtain a bird's eye view, sharpens the contrast between the text and the background and finally recognizes the text with formatting kept intact, being further corrected by MonReader's ML powered redactor.
- Page flipping video from smart phones labelled as flipping and not flipping.
- Videos were clipped as short videos and labelled as flipping or not flipping. The extracted frames are then saved to disk in a sequential order with the following naming structure: VideoID_FrameNumber
- Using a custom CNN model, predict if the page is being flipped using a single image.
- Using a pre-trained ResNet, VGG16, and MobileNet, predict if the page is being flipped using a single image.
- Evaluate model performance based on F1 score, the higher the better but should be higher than 91%.
- Model should also have a final size smaller than 40Mb so it can fit in a smartphone app.
- Predict if a given sequence of images contains an action of flipping.