yigitkonur/swift-ocr-llm-powered-pdf-to-markdown

Can you add ollama support?

shasankp000 opened this issue · 5 comments

Can you add ollama support?

Being able to choose one's LLM and implement a local-first solution is highly desirable.

That said, I'm unable to use this tool because I don't have any keys to use AZURE-whatever. Why is this even needed, why can't we just use our OPENAI_API_KEY alone?

You can use this directly with a standard OpenAI key - if you don't input an Azure endpoint, I default to the OpenAI base URL. You could even adapt this for Anthropic with minimal tweaks by running the code through ChatGPT for some minor adjustments. It's not too complicated.

As far as I know, Ollama provides multi-modal support through LLaVa, but it might not be the most performant or consistent option. That's why I haven't added local LLM support yet, but I'll look into it when I have some downtime. I might add it in the future.

How do I use it without the Azure endpoint? If I remove the environment variables or leave them empty, I get:
File "", line 488, in _call_with_frames_removed
File "/workspace/swift-ocr-llm-powered-pdf-to-markdown/main.py", line 52, in
Settings.validate()
File "/workspace/swift-ocr-llm-powered-pdf-to-markdown/main.py", line 47, in validate
raise ValueError(
ValueError: Missing required environment variables: AZURE_OPENAI_ENDPOINT, OPENAI_DEPLOYMENT_ID

or

2024-09-27 09:56:03,922 - main - INFO - Deleted temporary PDF file /tmp/tmpilvez95t.pdf.
2024-09-27 09:56:03,922 - main - ERROR - HTTPException: OCR processing failed: Connection error.
INFO: 127.0.0.1:48952 - "POST /ocr HTTP/1.1" 502 Bad Gateway

I think I might fork the project and add ollama support myself.

@slucha Have you figured out how to fix the OCR processing failed: Connection error issue?