pdf-ocr: A Python repository from mttias

What is this project about?

This projects takes an input pdf and outputs the text content to a text file. It can handle larger pdf files by converting each page to its own text file in parallel. Then each file is merged to one text file.

How to use this project?

Clone this repository
Install the requirements (pip install -r requirements.txt)
Change the path to the pdf file (PDF_file in main())
Run the script (python main.py)

What is the output?

The output is a text file with the content.

Improvement areas

The script is bad at handling non-English languages and special characters.# pdf-ocr

mttias/pdf-ocr

What is this project about?

How to use this project?

What is the output?

Improvement areas