keensoft/alfresco-simple-ocr

OCRmyPDF creates error when run from alfresco-simple-ocr

koebln opened this issue · 2 comments

Alfresco Repository: 5.2.0 (r135134-b14) Community-Edition
Simple-OCR: 1.1.1
OS: Ubuntu Linux 16.04 (Kernel 4.4.0-71-generic)
OCRmyPDF: 4.5.3
Python 3: 3.5.2

I installed simple-ocr-repo.amp on my Alfresco Server to work with OCRmyPDF. The alfresco-global.properties contains the following lines:

### Simple OCR Action Properties
ocr.command=/usr/local/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=
ocr.extra.commands=-l deu -c -d
ocr.server.os=linux

The folder - rule is working and I can see running tasks from ocrmypdf, unpaper, tesseract.

A file .../tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282.pdf is created, but no .../tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282_ocr.pdf.

alfresco.log contains:

Execution result: 
   os:         Linux
   command:    /usr/local/bin/ocrmypdf -l deu -c -d /home/andreas/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282.pdf /home/andreas/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282_ocr.pdf
   succeeded:  true
   exit code:  15
   out:        
   err:        /usr/bin/python3: /home/andreas/alfresco-community/common/lib/libz.so.1: no version information available (required by /usr/bin/python3)
WARNING -    1: [tesseract] lots of diacritics - possibly poor OCR
  ERROR - Unrecoverable error: rangecheck in .
	at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:72)

Running the command directly on the shell creates the target file without any problems. Are there problems with the environment? Changing the libz.so.1 to the OS Version did not help and caused another similar Error in python3.

Probably you should isolate OCR program execution from Alfresco to avoid library conflicts. Try one of the methods described at https://github.com/keensoft/alfresco-simple-ocr/wiki/FAQ

It is working. I used the sudo - variant and everything looks good. Thank you for the support.