Exception during read_pdf: SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
drjubbs opened this issue · 1 comments
Summary of your issue
tabula-py throws an exception processing an image dense PDF
Check list before submit
-
Did you read FAQ?
-
(Optional, but really helpful) Your PDF URL: https://api.environdec.com/api/v1/EPDLibrary/Files/6410b3fe-07f0-4766-b6b0-08da4d16d787/Data
-
Paste the output of
import tabula; tabula.environment_info()
on Python REPL:
Python version:
3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
Java version:
java version "18.0.2" 2022-07-19
Java(TM) SE Runtime Environment (build 18.0.2+9-61)
Java HotSpot(TM) 64-Bit Server VM (build 18.0.2+9-61, mixed mode, sharing)
tabula-py version: 2.4.0
platform: Windows-10-10.0.19044-SP0
uname:
uname_result(system='Windows', node='windows-knime', release='10', version='10.0.19044', machine='AMD64')
linux_distribution: ('', '', '')
mac_ver: ('', ('', '', ''), '')
What did you do when you faced the problem?
I was able to confirm the library works fine on simple PDFs lacking images.
Code:
tables4 = tabula.read_pdf(r"E:\PDF\celsa_circular_steel_EPD_Special_Steel_Wire.pdf", pages="all")
Expected behavior:
read_pdf should return an array of tables
Actual behavior:
Got stderr: Jul 29, 2022 3:51:02 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Jul 29, 2022 3:51:02 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Jul 29, 2022 3:51:02 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Jul 29, 2022 3:51:04 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Jul 29, 2022 3:51:04 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Jul 29, 2022 3:51:05 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Jul 29, 2022 3:51:05 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
Related Issues:
None
This is not an issue because warning from PDFbox. See FAQ: https://tabula-py.readthedocs.io/en/latest/faq.html#i-got-a-warning-error-message-from-pdfbox-including-org-apache-pdfbox-pdmodel-is-it-the-cause-of-empty-dataframe
In this case, somehow setting guess=True
(by default option) causes empty results. Try other options like stream=True
.
In [1]: import tabula
In [4]: fname = "Data.pdf"
In [5]: tabula.read_pdf(fname, pages=6, guess=False)
Got stderr: Aug 06, 2022 5:35:40 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Aug 06, 2022 5:35:40 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
Out[5]:
[ 3. Product Product description Unnamed: 0
0 information Global Steel Wire is one of Europe’s leading s... NaN
1 NaN wire rod, with an extensive range of NaN
2 NaN grades and diameters ranging from NaN
3 NaN 5.5 mm up to 52 mm, manufactured NaN
4 NaN in accordance with international NaN
5 NaN standards and tailored to our custo- NaN
6 NaN mers’ technical specifications. NaN
7 NaN Wire rod is available in low, medium and high NaN
8 NaN carbon steels (between 0.05% and 1.25%) with NaN
9 NaN different alloy grades (Al, B, Cr, Mn, Mo, P, Si, NaN
10 NaN S, among others), depending on its composi- NaN
11 NaN tion and characteristics. NaN
12 NaN In addition, wire rod can be supplied in round NaN
13 NaN or hexagonal section, in different coil formats NaN
14 NaN and with a wide variety of heat and surface NaN
15 NaN treatments. Among its extensive range of NaN
16 NaN products, Global Steel Wire specialises in NaN
17 Product name: wire rods for cold heading, tyre reinforcement, NaN
18 NaN suspension springs, free cutting and cold NaN
19 Hot-rolled steel wire rod drawing, as well as low, medium and high NaN
20 NaN carbon steels. NaN
21 Product identification: Global Steel Wire is present in all sectors NaN
22 NaN where wire rod based products are manufac- NaN
23 Hot-rolled special steel wire tured, and has become one of the European NaN
24 manufactured in electric arc leaders in sectors with high technological NaN
25 furnace based on scrap. demands, especially in the automotive sector. NaN
26 NaN NaN 10.0]