/ocr-microservice

This microservice standardizes the usage of Optical Character Recognition (OCR) engines

Primary LanguagePythonMIT LicenseMIT

OCR Microservice

This microservice standardizes the usage of Optical Character Recognition (OCR) engines, providing a unified interface to access multiple OCR engines. It currently supports three main OCR engines:

  • PaddleOCR: A deep learning-based OCR engine capable of handling various tasks such as text detection, recognition, and structure analysis.
  • Tesseract: An open-source OCR engine that provides accurate text recognition from images.
  • EasyOCR: Another deep learning-based OCR engine known for its simplicity and ease of use.

The microservice returns OCR output in a standardized format of a Document following the structure defined by DocumentAI-std. An example of the output is provided below:

{
  "ocr_result": {
    "filename": "0c5d743d-d936-40ae-9642-c9db27c6155c.png",
    "elements": [
      {
        "x": 48,
        "y": 45,
        "w": 47,
        "h": 20,
        "content_type": 1,
        "content": "STE"
      },
      {
        "x": 104,
        "y": 47,
        "w": 97,
        "h": 20,
        "content_type": 1,
        "content": "SIDMAC"
      }
    ]
  },
  "code": 200,
  "message": "success"
}

Built with

  • Python3.11: The microservice is developed using Python 3.11, providing a robust and efficient runtime environment.
  • Fast API: FastAPI is used to build the RESTful API endpoints, offering high performance and easy-to-use tools for API development.

Usage

Locally:

  1. Download the repository:
git clone https://github.com/LATIS-DocumentAI-Group/ocr-microservice.git
cd ocr-microservice
  1. Install the requirements:
pip install -r requirements.txt
  1. Run the main file:
python main.py

Using Docker

  1. Pull the Docker image:
docker pull hamzagbada18/ocr-microservice:latest
  1. Run the Docker container:
docker run -p 8000:8000 --name ocr-api hamzagbada18/ocr-microservice:latest
  1. Access the OpenAPI documentation:

You can access the OpenAPI Specification documentation through the following link: localhost:8000/docs

  1. Acces throw REST API
  • POST /applyOcr/
  • Apply OCR

Params:

Name Description
ocr_method This attribute indicates which OCR method will be applied. For Paddle OCR, ocr_method = paddle. For Tesseract, ocr_method = tesseract. For EasyOCR, ocr_method = easy.
languages List of supported languages. Supported languages are fr (French) and en (English). Note: Paddle OCR accepts only one language.
  1. Example usage with curl:
curl -X 'POST' \
  'http://localhost:8000/applyOcr/?ocr_method=tesseract&languages=en' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@invoiceBLUR.png;type=image/png'