/penteract-ocr

⭐️ The native node.js bindings to the Tesseract OCR project.

Primary LanguageC++OtherNOASSERTION

Build Status Coverage

penteract

The native Node.js bindings to the Tesseract OCR project.

  • Using Node.js bindings, avoid spawning tesseract command line.
  • Asynchronous I/O: Image reading and processing in insulated event loop backed by libuv.
  • Support to read image data from JavaScript buffers.

Contributions are welcome.

Install

First of all, a g++ 4.9 compiler is required.

Before install penteract, the following dependencies should be installed

$ brew install pkg-config tesseract # mac os

Then npm install

$ npm install penteract

To Use with Electron

Due to the limitation of node native modules, if you want to use penteract with electron, add a .npmrc file to the root of your electron project, before npm install:

runtime = electron
; The version of the local electron,
; use `npm ls electron` to figure it out
target = 1.7.5
target_arch = x64
disturl = https://atom.io/download/atom-shell

Usage

Recognize an Image Buffer

import {
  recognize
} from 'penteract'

import fs from 'fs-extra'

const filepath = path.join(__dirname, 'test', 'fixtures', 'penteract.jpg')

fs.readFile(filepath).then(recognize).then(console.log) // 'penteract'

Recognize a Local Image File

import {
  fromFile
} from 'penteract'

fromFile(filepath, {lang: 'eng'}).then(console.log)     // 'penteract'

recognize(image [, options])

  • image Buffer the content buffer of the image file.
  • options PenteractOptions= optional

Returns Promise.<String> the recognized text if succeeded.

fromFile(filepath [, options])

  • filepath Path the file path of the image file.
  • options PenteractOptions=

Returns Promise.<String>

PenteractOptions Object

{
  // @type `(String|Array.<String>)=eng`,
  //
  // Specifies language(s) used for OCR.
  //   Run `tesseract --list-langs` in command line for all supported languages.
  //   Defaults to `'eng'`.
  //
  // To specify multiple languages, use an array.
  //   English and Simplified Chinese, for example:
  // ```
  // lang: ['eng', 'chi_sim']
  // ```
  lang: 'eng'
}

Promise.reject(error)

  • error Error The JavaScript Error instance
    • code String Error code.
    • message String Error message.
    • other properties of Error.

code: ERR_READ_IMAGE

Rejects if it fails to read image data from file or buffer.

code: ERR_INIT_TESSER

Rejects if tesseract fails to initialize

Example of Using with Electron

// For details of `mainWindow: BrowserWindow`, see
// https://github.com/electron/electron/blob/master/docs/api/browser-window.md
mainWindow.capturePage({
  x: 10,
  y: 10,
  width: 100,
  height: 10

}, (data) => {
  recognize(data.toPNG()).then(console.log)
})

Compiling Troubles

For Mac OS users, if you are experiencing trouble when compiling, run the following command:

$ xcode-select --install

will resolve most problems.

Warnings:

xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance

resolver:

$ sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

License

MIT