DocumentVision

DocumentVision is a node.js library for processing and understanding scanned documents.

Features

Image loading using jpeg-compressor, LodePNG and pixel buffers
Image manipulation using Leptonica (Version 1.69)
OCR using Tesseract (Version 3.02, SVN r866)
OMR for Barcodes using ZXing (Version 2.3.0)

Installation

$ npm install dv

Quick Start

Once you've installed, download that image. You can use any other image containing simple text at 300dpi or higher. Now run the following code snipped to recognize text from your image:

var dv = require('dv');
var fs = require('fs');
var image = new dv.Image('png', fs.readFileSync('textpage300.png'));
var tesseract = new dv.Tesseract('eng', image);
console.log(tesseract.findText('plain'));

What's next?

Here are some quick links to help you get started:

Versioning

DocumentVision is maintained under the Semantic Versioning guidelines as much as possible:

Version number format is <major>.<minor>.<patch>
Breaking backward compatibility bumps the major (resetting minor and patch)
New additions without breaking backward compatibility bumps the minor (resetting patch)
Bug fixes and other changes bumps the patch

ytham/node-dv