eihli/image-table-ocr

Version of the external requierements

sebastiankmilo opened this issue · 6 comments

first, thanks for this package its look amazing.

help

what is the version that i should install of:

  • pdfimages from Poppler
  • Tesseract
  • mogfrify ImageMagick
eihli commented

Thanks for the feedback. I'll hopefully get around to updating the documentation soon. In the meantime:

➜  prhyme git:(master) ✗ mogrify --version
Version: ImageMagick 7.0.10-26 Q16 x86_64 2020-08-09 https://imagemagick.org
Copyright: © 1999-2020 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC HDRI Modules OpenMP(4.5)
Delegates (built-in): bzlib cairo djvu fontconfig freetype heic jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png raqm raw rsvg tiff webp wmf x xml zlib
➜  prhyme git:(master) ✗ pdfimages -v
pdfimages version 0.90.1
Copyright 2005-2020 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
➜  prhyme git:(master) ✗ tesseract -v
tesseract 5.0.0-alpha-647-g4a00b
 leptonica-1.80.0
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.0.4) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found OpenMP 201511
 Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
 Found libcurl/7.72.0 OpenSSL/1.1.1g zlib/1.2.11 zstd/1.4.5 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh2/1.9.0 nghttp2/1.41.0
eihli commented

Added the external versions that I've used locally to the README.

first, thanks for this package its look amazing.

help

what is the version that i should install of:

  • pdfimages from Poppler
  • Tesseract
  • mogfrify ImageMagick

Hello. I am newbie in Python, could you tell me how to install all of these for Python ? For instance, I have tried:

C:\Program Files\Python36>pip install --upgrade poppler
Collecting poppler
  Could not find a version that satisfies the requirement poppler (from versions: )
No matching distribution found for poppler
You are using pip version 18.1, however version 20.2.4 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

I guess is not the right way ...

eihli commented

I've never used these tools on Windows. There might be a bit of learning curve with some of this, but don't be discouraged.

Poppler isn't a Python package. It's a program of its own. https://poppler.freedesktop.org/ That website has a link to a Windows version. https://ci.appveyor.com/project/tsdgeos/poppler-mirror

This following note may not be necessary but I want to mention it so you'll at least be aware: If you find yourself needing tools that don't exist for Windows, you can try something like Cygwin which will give you some Linux-like abilities at the Windows command line. https://www.cygwin.com/ But that's another big learning curve that I wouldn't explore until other options have been exhausted and if you're interested in spending time spinning wheels, because you'll get to learn about building sources from scratch and hunting down lots of dependencies.

Just to be sure, I should have installed Poppler, Tesseract and ImageMagick in order to compile your Python code ? Or Tesseract and ImageMagick is enough to be as Python package ?

eihli commented

It depends on what you want to do. Poppler is needed if you want to extract images from PDF files. ImageMagick is needed if you need to rotate images, like if they are rotated 90 degrees sideways. Tesseract is needed for recgonizing characters in the table cells.