Web service for image reading with machine learning

Extracting content from images using Tesseract OCR

This is a simple web service with the ability to extract text from images in any language on the globe

What he does?
  • receive an image
  • burn the image to disk
  • use tesseract ocr to get the result
Dependencies

You can use the container available in the repository just by running

docker-compose up -d --build

or run directly on the host, but for that you need to install tesseract-ocr and tesseract-ocr-por (this gives the ability to successfully get texts in Portuguese) with that just run:

node index.js

in package scripts contains some helper scripts

How to use?

send a request post with the payload in multipart with the image parameter of type file containing the desired image

http://localhost:8888/upload/image
using other languages

for that you can use eng(english) which is standard language, other than that you can install other languages, see here to see the list of supported languages ​​and install using the example prefix:

tesseract-ocr-PREFIX

after change in src/controllers/Uploads.js in config

	const config = {
			lang: "PREFIX",
			oem: 1,
			psm: 3,
		}

Comments:

Some images may have poor quality, bad lighting or contrast, so the image can be treated using ImageMagick before the tesseract processes it, to get a more accurate result.

@desposivel