/cap-ml

Text Detection Capacitor Plugin

Primary LanguageSwiftOtherNOASSERTION

Cap-ML

Machine Learning Plugin for Capacitor. Currently offered implementations include -

  • Text Detector: Text Detection in still images.

    On the ios side, we're using Apple's Vision Framework and MLKit's Vision Framework on the Android side. Both have some limitations like not being able to detect cursive/handwriting font etc.

    TextDetector expects the image to be sent in portrait mode only, i.e. with text facing up. It will try to process even otherwise, but note that it might result in gibberish.

Compatibility Chart

Feature ios android
ML Framework CoreML Vision Firebase MLKit
Text Detection with Still Images Yes Yes
Detects lines of text Yes Yes
Bounding Coordinates for Text Yes Yes
Image Orientation Yes (Up, Left, Right, Down) Yes (Up, Left, Right, Down)
Skewed Text Yes Unreliable
Rotated Text (<~ 45deg) Yes Yes (but with noise)
On-Device Yes Yes
SDK/ios Version ios 13.0 or newer Targets API level >= 16
Uses Gradle >= 4.1
com.android.tools.build:gradle >= v3.2.1
compileSdkVersion >= 28

Installation

npm install cap-ml

Usage

TextDetector exposes only one method detectText that returns a Promise with an array of text detections -

// Orientation here is not the current orientation of the image, but the direction in which the image should be turned to make it upright
detectText(filename: string, orientation?: ImageOrientation): Promise<TextDetection[]>

TextDetection looks like -

interface TextDetection {
  bottomLeft: [number, number]; // [x-coordinate, y-coordinate]
  bottomRight: [number, number]; // [x-coordinate, y-coordinate]
  topLeft: [number, number]; // [x-coordinate, y-coordinate]
  topRight: [number, number]; // [x-coordinate, y-coordinate]
  text: string;
}

ImageOrientation is an enum -

enum ImageOrientation {
  Up = "UP",
  Down = "DOWN",
  Left = "LEFT",
  Right = "RIGHT",
}

bottomLeft[x,y], bottomRight[x,y], topLeft[x,y], topRight[x,y] provide the coordinates for the bounding quadrangle for the detected 'text'. Often, this would be a rectangle, but the text might be skewed.

Example Usage

import { Plugins } from '@capacitor/core';
const { Camera } = Plugins;
import { TextDetector, TextDetection } from 'cap-ml';

and used like:

 # prompt the user to select a picture
  const imageFile = await Camera.getPhoto({
    resultType: CameraResultType.Uri,
    source: CameraSource.Photos,
  })

  # pass in the picture to 'CapML' plugin
  const td = new TextDetector();
  const textDetections = await td.detectText(imageFile.path!)

  # or with orientation -
  # const textDetections = await td.detectText(imageFile.path!, ImageOrientation.Up)

  # textDetections is an array of detected texts and corresponding bounding box coordinates
  # which can be accessed like -
  textDetections.forEach((detection: TextDetection) => {
    text = detection.text
    bottomLeft = detection.bottomLeft
    bottomRight = detection.bottomRight
    topLeft = detection.topLeft
    topRight = detection.topRight
  })

If you're using it in an Android app (generated through Ionic), there is an additional step. Make sure to register the plugin in the app's MainActivity.java - Import the Plugin: import com.bendyworks.capML.CapML - Register the Plugin: On the same file, inside OnCreate's init, add - add(CapML.class)

A complete example can be found in the examples folder - examples/text-detection/ImageReader

If you're planning to use the Camera Plugin like in the example project or use an image from the Photo Library -

For ios:

  • Open the app in XCode by running npx cap open ios from the sample app's root directory. ie here, at examples/text-detection/ImageReader
  • Open info.plist
  • Add the corresponding permissions to the app -
    • Privacy - Camera Usage Description: To Take Photos and Video
    • Privacy - Photo Library Additions Usage Description: Store camera photos to camera
    • Privacy - Photo Library Usage Description: To Pick Photos from Library

For Android:

  • Open the app in Android Studio by running npx cap open android from the sample app's root directory. ie here, at examples/text-detection/ImageReader

  • Open app/manifests/AndroidManifest.xml

  • Add the corresponding permissions to the app -

    • android.permission.INTERNET
    • android.permission.READ_EXTERNAL_STORAGE
    • android.permission.WRITE_EXTERNAL_STORAGE
    • android.permission.CAMERA
  • Note: Sample App is set up to download Firebase's OCR model for Text Detection upon installing the app. If the app errors out with something like - Considering local module com.google.android.gms.vision.ocr:0 and remote module com.google.android.gms.vision.ocr:0. E/Vision: Error loading module com.google.android.gms.vision.ocr optional module true: com.google.android.gms.dynamite.DynamiteModule$LoadingException: No acceptable module found. Local version is 0 and remote version is 0..

    This is a known bug with Google Play Services.

    Follow these steps -

    1. Uninstall app from the device/emulator.
    2. Update 'Google Play Services' - make sure you have the latest version.
    3. Clear cache and store for 'Google Play Services'
    4. Restart the device/emulator
    5. Install and run the app.

Development

After checking out the repo,

  • run npm install to install dependencies. Plugin should be ready at this point. To test it out -
  • navigate to examples/text-detection/ImageReader
  • run npm install to install dependencies
  • run npm run build && npx cap sync to sync the project with ios and android

ios Development

  • run npx capacitor open ios to open up an XCode project.
  • Run the XCode project either on a simulator or a device.
  • For each change in the javascript part of the app, run npm run build && npx cap sync ios to deploy the corresponding changes to ios app (or)
  • (recommended) Enable live reload of the app, using ionic capacitor run ios --livereload Plugin code is located at Pods/DevelopmentPods/CapML
  • Plugin.swift is the entry point to the Plugin.

Android Development

Step 1: Open Android Project

  • run npx capacitor open android to open up an Android Studio project.

Step 2: Create Firebase Project and App

  • Naviagte to https://console.firebase.google.com/ and sign-in

  • Click on 'Add Project' and follow through the steps (Enable Google Analytics if you like but the project doesn't particularly need it)

  • Once the project is created, click on 'android' icon to create an android app.

  • Register App:

    • Enter the package name - this should be the same as the package name of your app. For example - package name in the example project here is com.bendyworks.CapML.ImageReader. Enter that if you wish to run the sample project. If you're setting up a new project, enter the package name of that app. (Tip: You can find it in app/AndroidManifest.xml). Click 'Register App'
    • Download google-services.json
    • Place the downloaded google-services.json in your project's app directory.
  • Add Firebase SDK: Example project should already this is place, but if you're setting up a new project, follow the instructions to modify build.gradle to use the downloaded google-services.json

  • Once the build changes are in place, perform a Gradle sync at this point. (Android Studio will prompt for a gradle sync as soon as a change is made to build files)

Step 3: Making changes and running the app

  • The example project is already setup to use the plugin, but if you're setting up a new project - In the project's MainActivity.java - - Import the Plugin: import com.bendyworks.capML.CapML - Register the Plugin: On the same file, inside OnCreate's init, add - add(CapML.class)

  • Build and Run the project either on a simulator or a device.

  • For each change in the javascript part of the app, run npm run build && npx cap sync android to deploy the corresponding changes to android app

    (or)

  • (recommended) Enable live reload of the app, using ionic capacitor run android --livereload

  • Plugin code is located at android-cap-ml/java/com.bendyworks.capML

  • CapML.java is the entry point to the Plugin. (Note: When plugin code is updated, make sure to rebuild the project before running it.)

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/bendyworks/cap-ml.

If you're curious about the implementation, here's an extensive blog post series - https://bendyworks.com/blog/capacitor-plugin-for-text-detection-part1

License

Hippocratic License Version 2.0.

For more information, refer to LICENSE file