📸 Camera object recognizer with object-to-voice integrated. 🤖
✅ This project has diagrams to help you understand how it works! 📌
ScannerCam is a web application that uses the TensorFlow.js computer vision library to detect objects in real time with the camera of a mobile device or a computer. Built with Next.js, React, TypeScript, TailwindCSS, and Playwright.
graph TD
A([User clicks Start app]) -->|Downloading TensorFlow model...| B[Model downloaded]
B --> |App can work offline from here...|C[Camera button available]
C --> D([User clicks on camera button])
D --> E[Is camera access allowed?]
E --> |No|C
E --> |Yes|G[React-Webcam component rendered]
G --> H[Tensorgram mechanism is started...]
These are the main technologies used to build ScannerCam:
ScannerCam is a web application that contains a camera module (capable of reverting to environment camera and front camera on mobile devices) whose frames are fed in real time to the COCO-SSD model (COCO stands for Common Objects In Context) (SSD stands for Single Shot MultiBox Detection) of TensorFlow.js computer vision to detect up to 80 object classes.
"We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location." From the SSD: Single Shot MultiBox Detector paper.
When the user clicks the button "Start app", the app requests the Machine Learning Model from the TensorFlow.js Hub. The TensorFlow.js Hub then responds by sending the model's architecture and weights to the app.
Once the model is ready, it is not necessary to download it again. The model is stored in the browser's cache. This way, the app can work offline after the first load.
If you want to know more about the COCO-SSD model, you can read the TensorFlow.js documentation.
graph TD
A(Tensorgram mechanism) --> B[Get the frames from the HTML video element]
B --> |Sent to the TensorFlow.js model|C[(COCO SSD Detection Model)]
C --> |Returns|D([Array of detections])
D --> E[\Get classes detected/]
E --> |Update useState|G[(Detections Storage)]
D --> H[Create Diagrams with createTensorgram]
H --> I[Create an overlay element on top of the React-Webcam component]
I --> J[Paint boxes with text in the overlay element created]
It also uses the Speech Synthesis API to talk about objects detected on camera while the option is active. This feature is called object-to-voice.
Both features have language internationalization support in English and Spanish. The language change happens by the user's preferred language in the browser.
ScannerCam UI was built on React v18 with TypeScript.
It is responsive. And it's also available with a beautiful light and dark mode based on the user's preferred color scheme in the browser.
ScannerCam is continuously tested with Playwright. Playwright is a Node.js library to automate Chromium, Firefox, and WebKit with a single API.
More than 10 assertions are made to ensure that the app is working correctly.
The tests are located in the tests
folder.
ScannerCam is deployed on Vercel. Vercel is a cloud platform for static sites, hybrid apps, and Serverless Functions.
ScannerCam has a simple and clean design. It is based on the Material Design guidelines.
The library used to build the interface is TailwindCSS. It is a utility-first CSS framework for rapidly building custom user interfaces.
The colors chosen to paint Scanner Cam are shades of red that change depending on the user's preference in dark mode and light mode.
- Red darker:
#6A0012
- Red dark:
#A00037
- Red candydark:
#D81B60
- Red candylight:
#FF5C8D
- Red light:
#FF90BD
- Red lighter:
#FFC2EF
Do you would like to contribute? Do you want to be the author of a new feature? Awesome! please fork the repository and make changes as you like. Pull requests are warmly welcome.
Distributed under the MIT License.
See LICENSE
for more information.