Machine Learning Plugin for Capacitor. Currently offered implementations include -
-
Text Detector: Text Detection in still images.
On the ios side, we're using Apple's Vision Framework and MLKit's Vision Framework on the Android side. Both have some limitations like not being able to detect cursive/handwriting font etc.
TextDetector expects the image to be sent in portrait mode only, i.e. with text facing up. It will try to process even otherwise, but note that it might result in gibberish.
Feature | ios | android |
---|---|---|
ML Framework | CoreML Vision | Firebase MLKit |
Text Detection with Still Images | Yes | Yes |
Detects lines of text | Yes | Yes |
Bounding Coordinates for Text | Yes | Yes |
Image Orientation | Yes (Up, Left, Right, Down) | Yes (Up, Left, Right, Down) |
Skewed Text | Yes | Unreliable |
Rotated Text (<~ 45deg) | Yes | Yes (but with noise) |
On-Device | Yes | Yes |
SDK/ios Version | ios 13.0 or newer | Targets API level >= 16 Uses Gradle >= 4.1 com.android.tools.build:gradle >= v3.2.1 compileSdkVersion >= 28 |
npm install cap-ml
TextDetector exposes only one method detectText
that returns a Promise with an array of text detections -
// Orientation here is not the current orientation of the image, but the direction in which the image should be turned to make it upright
detectText(filename: string, orientation?: ImageOrientation): Promise<TextDetection[]>
TextDetection looks like -
interface TextDetection {
bottomLeft: [number, number]; // [x-coordinate, y-coordinate]
bottomRight: [number, number]; // [x-coordinate, y-coordinate]
topLeft: [number, number]; // [x-coordinate, y-coordinate]
topRight: [number, number]; // [x-coordinate, y-coordinate]
text: string;
}
ImageOrientation is an enum -
enum ImageOrientation {
Up = "UP",
Down = "DOWN",
Left = "LEFT",
Right = "RIGHT",
}
bottomLeft[x,y], bottomRight[x,y], topLeft[x,y], topRight[x,y] provide the coordinates for the bounding quadrangle for the detected 'text'. Often, this would be a rectangle, but the text might be skewed.
import { Plugins } from '@capacitor/core';
const { Camera } = Plugins;
import { TextDetector, TextDetection } from 'cap-ml';
and used like:
# prompt the user to select a picture
const imageFile = await Camera.getPhoto({
resultType: CameraResultType.Uri,
source: CameraSource.Photos,
})
# pass in the picture to 'CapML' plugin
const td = new TextDetector();
const textDetections = await td.detectText(imageFile.path!)
# or with orientation -
# const textDetections = await td.detectText(imageFile.path!, ImageOrientation.Up)
# textDetections is an array of detected texts and corresponding bounding box coordinates
# which can be accessed like -
textDetections.forEach((detection: TextDetection) => {
text = detection.text
bottomLeft = detection.bottomLeft
bottomRight = detection.bottomRight
topLeft = detection.topLeft
topRight = detection.topRight
})
If you're using it in an Android app (generated through Ionic), there is an additional step. Make sure to register the plugin in the app's MainActivity.java
- Import the Plugin: import com.bendyworks.capML.CapML
- Register the Plugin: On the same file, inside OnCreate's init, add - add(CapML.class)
A complete example can be found in the examples folder - examples/text-detection/ImageReader
If you're planning to use the Camera Plugin like in the example project or use an image from the Photo Library -
For ios:
- Open the app in XCode by running
npx cap open ios
from the sample app's root directory. ie here, at examples/text-detection/ImageReader - Open info.plist
- Add the corresponding permissions to the app -
- Privacy - Camera Usage Description: To Take Photos and Video
- Privacy - Photo Library Additions Usage Description: Store camera photos to camera
- Privacy - Photo Library Usage Description: To Pick Photos from Library
For Android:
-
Open the app in Android Studio by running
npx cap open android
from the sample app's root directory. ie here, at examples/text-detection/ImageReader -
Open app/manifests/AndroidManifest.xml
-
Add the corresponding permissions to the app -
- android.permission.INTERNET
- android.permission.READ_EXTERNAL_STORAGE
- android.permission.WRITE_EXTERNAL_STORAGE
- android.permission.CAMERA
-
Note: Sample App is set up to download Firebase's OCR model for Text Detection upon installing the app. If the app errors out with something like -
Considering local module com.google.android.gms.vision.ocr:0 and remote module com.google.android.gms.vision.ocr:0. E/Vision: Error loading module com.google.android.gms.vision.ocr optional module true: com.google.android.gms.dynamite.DynamiteModule$LoadingException: No acceptable module found. Local version is 0 and remote version is 0.
.This is a known bug with Google Play Services.
Follow these steps -
- Uninstall app from the device/emulator.
- Update 'Google Play Services' - make sure you have the latest version.
- Clear cache and store for 'Google Play Services'
- Restart the device/emulator
- Install and run the app.
After checking out the repo,
- run
npm install
to install dependencies. Plugin should be ready at this point. To test it out - - navigate to examples/text-detection/ImageReader
- run
npm install
to install dependencies - run
npm run build && npx cap sync
to sync the project with ios and android
- run
npx capacitor open ios
to open up an XCode project. - Run the XCode project either on a simulator or a device.
- For each change in the javascript part of the app, run
npm run build && npx cap sync ios
to deploy the corresponding changes to ios app (or) - (recommended) Enable live reload of the app, using
ionic capacitor run ios --livereload
Plugin code is located at Pods/DevelopmentPods/CapML Plugin.swift
is the entry point to the Plugin.
- run
npx capacitor open android
to open up an Android Studio project.
-
Naviagte to https://console.firebase.google.com/ and sign-in
-
Click on 'Add Project' and follow through the steps (Enable Google Analytics if you like but the project doesn't particularly need it)
-
Once the project is created, click on 'android' icon to create an android app.
-
Register App:
- Enter the package name - this should be the same as the package name of your app.
For example - package name in the example project here is
com.bendyworks.CapML.ImageReader
. Enter that if you wish to run the sample project. If you're setting up a new project, enter the package name of that app. (Tip: You can find it in app/AndroidManifest.xml). Click 'Register App' - Download google-services.json
- Place the downloaded google-services.json in your project's app directory.
- Enter the package name - this should be the same as the package name of your app.
For example - package name in the example project here is
-
Add Firebase SDK: Example project should already this is place, but if you're setting up a new project, follow the instructions to modify build.gradle to use the downloaded google-services.json
-
Once the build changes are in place, perform a Gradle sync at this point. (Android Studio will prompt for a gradle sync as soon as a change is made to build files)
-
The example project is already setup to use the plugin, but if you're setting up a new project - In the project's MainActivity.java - - Import the Plugin:
import com.bendyworks.capML.CapML
- Register the Plugin: On the same file, inside OnCreate's init, add -add(CapML.class)
-
Build and Run the project either on a simulator or a device.
-
For each change in the javascript part of the app, run
npm run build && npx cap sync android
to deploy the corresponding changes to android app(or)
-
(recommended) Enable live reload of the app, using
ionic capacitor run android --livereload
-
Plugin code is located at
android-cap-ml/java/com.bendyworks.capML
-
CapML.java
is the entry point to the Plugin. (Note: When plugin code is updated, make sure to rebuild the project before running it.)
Bug reports and pull requests are welcome on GitHub at https://github.com/bendyworks/cap-ml.
If you're curious about the implementation, here's an extensive blog post series - https://bendyworks.com/blog/capacitor-plugin-for-text-detection-part1
Hippocratic License Version 2.0.
For more information, refer to LICENSE file