Swift OCR: LLM Powered Fast OCR ⚡

🌟 Features

Flexible Input Options: Accepts PDF files via direct upload or by specifying a URL.
Advanced OCR Processing: Utilizes OpenAI's GPT-4 Turbo with Vision model for accurate text extraction.
Performance Optimizations:
- Parallel PDF Conversion: Converts PDF pages to images concurrently using multiprocessing.
- Batch Processing: Processes multiple images in batches to maximize throughput.
- Retry Mechanism with Exponential Backoff: Ensures resilience against transient failures and API rate limits.
Structured Output: Extracted text is formatted using Markdown for readability and consistency.
Robust Error Handling: Comprehensive logging and exception handling for reliable operations.
Scalable Architecture: Asynchronous processing enables handling multiple requests efficiently.

📹 Demo

video.mp4

Demo video showcasing the conversion of NASA's Apollo 17 flight documents, which include unorganized, horizontally and vertically oriented pages, into well-structured Markdown format without any issues.

Here's a single, comprehensive section on cost comparison for your README:

Cost Comparison and Value Proposition

Our solution offers an optimal balance of affordability, accuracy, and advanced features:

Cost Breakdown

Average token usage per image: ~1200
Total tokens per page (including prompt): ~1500
[GPT4O] Input token cost: $5 per million tokens
[GPT4O] Output token cost: $15 per million tokens

For 1000 documents:

Estimated total cost: $15

Cost Optimization Options

Utilizing GPT4 mini: Reduces cost to ~$8 per 1000 documents
Implementing batch API: Further reduces cost to ~$4 per 1000 documents

Market Comparison

This solution is significantly more affordable than alternatives:

Our cost: $15 per 1000 documents
CloudConvert: ~$30 per 1000 documents (PDFTron mode, 4 credits required)

While cost-effectiveness is a major advantage, our solution also provides:

Superior accuracy and consistency
Precise table generation
Output in easily editable markdown format

This combination of affordability and advanced features makes solution stand out in the document processing market. It's not just about being cheaper; it's about providing excellent value through reliability, flexibility, and high-quality output.

🛠️ Installation

Prerequisites

Python 3.8+
Git
Virtualenv (optional but recommended)

Steps

Clone the Repository

git clone https://github.com/yourusername/llm-openai-ocr.git
cd llm-openai-ocr

Create a Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```

Configure Environment Variables

Create a .env file in the root directory and add the following variables:

OPENAI_API_KEY=your_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint
OPENAI_DEPLOYMENT_ID=your_openai_deployment_id
OPENAI_API_VERSION=your_openai_api_version  # Default is "gpt-4o"
BATCH_SIZE=10  # Optional: Default is 1
MAX_CONCURRENT_OCR_REQUESTS=5  # Optional: Default is 5
MAX_CONCURRENT_PDF_CONVERSION=4  # Optional: Default is 4

Note: Replace your_openai_api_key, your_azure_openai_endpoint, and your_openai_deployment_id with your actual OpenAI credentials.

Run the Application
```
uvicorn main:app --reload
```
The API will be available at http://127.0.0.1:8000.

🎯 Usage

API Endpoint

POST /ocr

Request Parameters

file: (Optional) Upload a PDF file.
ocr_request.url: (Optional) URL of the PDF to process.

You must provide either a file or a URL, not both.

Example Using `curl`

Uploading a PDF File:

curl -X POST "http://127.0.0.1:8000/ocr" -F "file=@/path/to/your/document.pdf"

Providing a PDF URL:

curl -X POST "http://127.0.0.1:8000/ocr" -F "ocr_request={\"url\": \"https://example.com/document.pdf\"}" -H "Content-Type: application/json"

Response

200 OK

{
  "text": "Extracted and formatted text from the PDF."
}

Error Responses
- 400 Bad Request: Invalid input parameters.
- 422 Unprocessable Entity: Validation errors.
- 500 Internal Server Error: Processing errors.

🧰 Configuration

All configurations are managed via environment variables. Ensure you have a .env file set up with the necessary variables as described in the Installation section.

Key Configuration Variables

OPENAI_API_KEY: Your OpenAI API key.
AZURE_OPENAI_ENDPOINT: The endpoint for Azure OpenAI service.
OPENAI_DEPLOYMENT_ID: Deployment ID for the OpenAI model.
OPENAI_API_VERSION: API version for OpenAI (default: "gpt-4o").
BATCH_SIZE: Number of images to process per OCR request (default: 1).
MAX_CONCURRENT_OCR_REQUESTS: Maximum number of concurrent OCR requests (default: 5).
MAX_CONCURRENT_PDF_CONVERSION: Maximum number of concurrent PDF page conversions (default: 4).

📜 License (thx for issue)

This project is released under the MIT License. This means you are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software without restriction, provided that the following copyright notice and permission notice are included in all copies or substantial portions of the software:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

smartyhouses/swift-ocr-llm-powered-pdf-to-markdown