#OCRJava Optical Character Recognition Service. This is the project for Innovation Day practice, also an important asset of Bluemix and Cognitive CoEs.
#Prerequisite
- Register your Bluemix account
- Create
Text to Speech
service - Install Bluemix and CF CLI
- Install Xcode (macOS only)
- Install Eclipse Java EE IDE for Web Developers as your IDE (Download)
- Setup Tomcat or Websphere Application Server Liberty Profile in the Eclipse for debugging purpose. Drag and drop this link into the Eclipse if you are installing
Websphere Application Server Liberty Profile
.
#Installation guide Windows
- Install the
tesseract
, download Windows Installer here (tesseract-ocr-setup-3.02.02.exe)
macOS
- Install HomeBrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- Install the
tesseract
brew install tesseract`
- Grant authority for the folder by changing owner or grant 766 in case you don't have the access
sudo chown -R $USER /usr/local
- Install Xcode, because
tesseract
needs to be compiled as it only provide the source code
Any platform
- Run git command or download the source code here
git clone git@github.com:CognitiveBuild/OCRJava.git
- When run the code in
Websphere Application Server Liberty Profile
as the web project, you need to downloadjai_imageio-1.1.jar
and put it into/Library/Java/JavaVirtualMachines/jdk{version}.jdk/Contents/Home/jre/lib/ext
folder. Maybe OSGI cause the problem - Add
Text-to-Speech
credentials in the code file:/OCRJava/src/com/ibm/waston/WastonSpeechHelper.java
, obtain the credentials from Bluemix account
private static final String TEXT_TO_SPEECH_USERNAME = "your_username";
private static final String TEXT_TO_SPEECH_PASSWORD = "your_password";
- Right click on the Chatbot project, choose
Run As
>Run on Server
to open the OCR sample application
#How to use
- Click on
Choose a file
button, then click onRecognize
button in the Firefox or Google Chrome
#Dependencies
- Tesseract for Java
- Apache Common IO
- FastJSON
- jai-imageio
- JNA
- Apache HTTP Client (Upload)
- Watson Java SDK
#Issues
- jai_imageio-1.1.jar can't be loaded in the project dependencies if the Java Runtime is the Liberty Profile
- Some Chinese characters cannot be well recognized due to the font issue, so the
tesseract
need to be trained, please check the reference below (Chinese version) http://www.cnblogs.com/mjorcen/p/3800739.html
#License Copyright 2016 GCG GBS CTO Office under the Apache 2.0 license.