ModuleNotFoundError: No module named 'unstructured.partition.utils.ocr_models'
jashdalvi opened this issue · 2 comments
jashdalvi commented
I used the latest pull from the unstructured api repo. This is specific to using paddle for ocr and also on GPU. Then these are the steps I followed:
- make install
- pip install onnxruntime-gpu
- pip install paddlepaddle-gpu
- pip install "unstructured.PaddleOCR"
- export ENTIRE_PAGE_OCR=paddle
- export TABLE_OCR=paddle
- make run-web-app
This was working fine with 0.0.47 version
crapthings commented
how to get paddle working?
export ENTIRE_PAGE_OCR=paddle
export TABLE_OCR=paddle
request failed with
{
"detail": "tesseract is not installed or it's not in your PATH. See README file for more information."
}
yuming-long commented
Hi @crapthings thanks for reaching out!
Sorry about the confusion, environment variable ENTIRE_PAGE_OCR
and TABLE_OCR
are being deprecated.
To make sure paddle is working, you might need to:
- make sure paddle is installed in your environment, you can run
make install-paddleocr
from unst repo - set the correct ENV
OCR_AGENT
to paddle withexport OCR_AGENT=paddle