Is there any easiler way to use this (OCR post-correction tool ) in python likewise we can easily use tesseract-OCR in python ?
Opened this issue · 2 comments
I want a simple way to use this aswome library in python.
likewise in python we can use tesseract-OCR see here how easy it is to use.
If it is possible to use it in python then we can also use it on windows.
i am using windows 10 64bit
@NavpreetDevpuri What do you mean by simple way?
This repo contains an OCR post-correction tool along with a much improved version of Ocropy 1 and ocrolib, but only for OCR-D – as the description/documentation says.
If you want non-OCR-D CLIs, you'll have to use the ocropus-*
tools from old Ocropy 1 (which is Python 2 only).
For Tesseract API in Python, I recommend tesserocr instead of pytesseract.
I don't see how your OS choice is relevant here.
Can we close this?
thanks for your reply.
i want to know that is there any way to use this OCR post-correction tool in python likewise we can easily use tesseract-OCR (OCR tool) in python ?
it seems like i need to setup Docker as mentioned user_guide
i want to use it in python without Docker likewise tesseract.
i want to use methods mentioned at workflows in a easiler way
something like
import ocrd
import cv2
config = {
"ocrd-olena-binarize": {"impl": "sauvola"},
"ocrd-anybaseocr-crop": None,
"ocrd-olena-binarize": {"impl": "kim"},
"ocrd-cis-ocropy-denoise": {"level-of-operation":"page"},
"ocrd-tesserocr-deskew": {"operation_level":"page"},
"ocrd-tesserocr-segment-region": None,
"ocrd-segment-repair": {"plausibilize": True},
"ocrd-cis-ocropy-deskew": {"level-of-operation":"region"},
"ocrd-cis-ocropy-clip": {"level-of-operation":"region"},
"ocrd-tesserocr-segment-line": None,
"ocrd-segment-repair": {"sanitize": True},
"ocrd-cis-ocropy-dewarp": None,
"ocrd-calamari-recognize": {"checkpoint":"/path/to/models/*.ckpt.json"}
}
img = cv2.read("someimage.jpg")
# Doing the post-correction magic
processed_img = ocrd.process(img, config)
# Now i can use pytesseract to get text from processed_img
text = pytesseract.image_to_string(processed_img)
print(text)
This tool is awsome but it should be easy to use.