LeoFCardoso/pdf2pdfocr

Zero OCR'ed files

Closed this issue · 4 comments

File: D:\Google_drive_sola\Sola\2022-2023\ROP - Reologija polimerov\RLP - Reologija polimerov.pdf
[2023-01-14 19:20:35.717707] [DEBUG] Tesseract can 'textonly_pdf': True
[2023-01-14 19:20:35.733704] [DEBUG] Tesseract version: 5
[2023-01-14 19:20:35.736704] [DEBUG] cuneiform not available
[2023-01-14 19:20:35.781705] [DEBUG] Pdftoppm version: 22.12.0
[2023-01-14 19:20:35.811712] [DEBUG] Qpdf version: 11.2.0
[2023-01-14 19:20:35.811712] [DEBUG] Temp dir is C:\Users\ADMINI~1\AppData\Local\Temp\pdf2pdfocr_L3VRF
[2023-01-14 19:20:35.811712] [DEBUG] Prefix is L3VRF
[2023-01-14 19:20:35.811712] [DEBUG] Script dir is c:\Users\Administrator\anaconda3\Scripts
[2023-01-14 19:20:35.812712] [DEBUG] Parallel operations will use 20 CPUs
[2023-01-14 19:20:35.861715] [LOG] Welcome to pdf2pdfocr version 1.12.0 marapurense - https://github.com/LeoFCardoso/pdf2pdfocr
[2023-01-14 19:20:35.903716] [LOG] Input file D:\Google_drive_sola\Sola\2022-2023\ROP - Reologija polimerov\RLP - Reologija polimerov.pdf: type is application/pdf
[2023-01-14 19:20:35.918716] [DEBUG] User conversion params: best
[2023-01-14 19:20:35.918716] [DEBUG] Output file: D:\Google_drive_sola\Sola\2022-2023\ROP - Reologija polimerov\RLP - Reologija polimerov-OCR.pdf for PDF and D:\Google_drive_sola\Sola\2022-2023\ROP - Reologija polimerov\RLP - Reologija polimerov-OCR.pdf.txt for TXT
[2023-01-14 19:20:35.918716] [LOG] Converting input file to images...
[2023-01-14 19:20:43.633767] [LOG] Checking blank pages
C:\Users\Administrator\anaconda3\lib\site-packages\PIL\Image.py:3074: DecompressionBombWarning: Image size (105023996 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
warnings.warn(
[2023-01-14 19:20:44.652767] [LOG] Starting OCR with tesseract...
[2023-01-14 19:20:45.154768] [LOG] OCR completed
[2023-01-14 19:20:45.155767] [DEBUG] We have 0 ocr'ed files
Error: No PDF files generated after OCR. This is not expected. Aborting.

Can you please share input file?

https://drive.google.com/open?id=1bjsNURMOBqGr-fpm3HT1XfmFVOKRTueF&authuser=ph6912%40student.uni-lj.si&usp=drive_fs

Just out of curiosity, the installation is ok?

PDF is output from the notetaking app Inkodo, from the Microsoft store.

Hello @PatrikHlebecStor.

Your PDF worked with me. :(

Please try to add "-r 200" in command line (this decrease image resolution and must solve DecompressionBombWarning).

Others PDF files can be OCRed in your installation?

Closing due to inactivity