/OCRJava

Optical Character Recognition Service

Primary LanguageJavaApache License 2.0Apache-2.0

#OCRJava Optical Character Recognition Service. This is the project for Innovation Day practice, also an important asset of Bluemix and Cognitive CoEs.

Language: Java Watson: Text-to-Speech GitHub license

#Prerequisite

#Installation guide Windows

macOS

  • Install HomeBrew
	/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  • Install the tesseract
	brew install tesseract`
  • Grant authority for the folder by changing owner or grant 766 in case you don't have the access
	sudo chown -R $USER /usr/local
  • Install Xcode, because tesseract needs to be compiled as it only provide the source code

Any platform

	git clone git@github.com:CognitiveBuild/OCRJava.git
  • When run the code in Websphere Application Server Liberty Profile as the web project, you need to download jai_imageio-1.1.jar and put it into /Library/Java/JavaVirtualMachines/jdk{version}.jdk/Contents/Home/jre/lib/ext folder. Maybe OSGI cause the problem
  • Add Text-to-Speech credentials in the code file: /OCRJava/src/com/ibm/waston/WastonSpeechHelper.java, obtain the credentials from Bluemix account
	private static final String TEXT_TO_SPEECH_USERNAME = "your_username";
	private static final String TEXT_TO_SPEECH_PASSWORD = "your_password";
  • Right click on the Chatbot project, choose Run As > Run on Server to open the OCR sample application

#How to use

  • Click on Choose a file button, then click on Recognize button in the Firefox or Google Chrome

#Dependencies

  • Tesseract for Java
  • Apache Common IO
  • FastJSON
  • jai-imageio
  • JNA
  • Apache HTTP Client (Upload)
  • Watson Java SDK

#Issues

  • jai_imageio-1.1.jar can't be loaded in the project dependencies if the Java Runtime is the Liberty Profile
  • Some Chinese characters cannot be well recognized due to the font issue, so the tesseract need to be trained, please check the reference below (Chinese version) http://www.cnblogs.com/mjorcen/p/3800739.html

#License Copyright 2016 GCG GBS CTO Office under the Apache 2.0 license.