OCRmyPDF creates error when run from alfresco-simple-ocr
koebln opened this issue · 2 comments
Alfresco Repository: 5.2.0 (r135134-b14) Community-Edition
Simple-OCR: 1.1.1
OS: Ubuntu Linux 16.04 (Kernel 4.4.0-71-generic)
OCRmyPDF: 4.5.3
Python 3: 3.5.2
I installed simple-ocr-repo.amp on my Alfresco Server to work with OCRmyPDF. The alfresco-global.properties contains the following lines:
### Simple OCR Action Properties
ocr.command=/usr/local/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=
ocr.extra.commands=-l deu -c -d
ocr.server.os=linux
The folder - rule is working and I can see running tasks from ocrmypdf, unpaper, tesseract.
A file .../tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282.pdf is created, but no .../tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282_ocr.pdf.
alfresco.log contains:
Execution result:
os: Linux
command: /usr/local/bin/ocrmypdf -l deu -c -d /home/andreas/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282.pdf /home/andreas/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_1267824845391978282_ocr.pdf
succeeded: true
exit code: 15
out:
err: /usr/bin/python3: /home/andreas/alfresco-community/common/lib/libz.so.1: no version information available (required by /usr/bin/python3)
WARNING - 1: [tesseract] lots of diacritics - possibly poor OCR
ERROR - Unrecoverable error: rangecheck in .
at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:72)
Running the command directly on the shell creates the target file without any problems. Are there problems with the environment? Changing the libz.so.1 to the OS Version did not help and caused another similar Error in python3.
Probably you should isolate OCR program execution from Alfresco to avoid library conflicts. Try one of the methods described at https://github.com/keensoft/alfresco-simple-ocr/wiki/FAQ
It is working. I used the sudo - variant and everything looks good. Thank you for the support.