chezou/tabula-py

read of URL-PDf does not work?!

gittesti opened this issue · 1 comments

Summary of your issue

Simple PDF read from URL gives EMPTY list ?!

read_pdf("https://training.refinitiv.com/portal/docs/pdf/raymondjames/Thomson_One_Exchange_List.pdf",
pages="all", multiple_tables=True)

Check list before submit

If not possible to execute tabula.environment_info(), please answer following questions manually.

  • [ x] Python 3.10.6
  • [ x] Windows 11 Pro 22000.1042

What did you do when you faced the problem?

Code:

from tabula import read_pdf
df = read_pdf("https://training.refinitiv.com/portal/docs/pdf/raymondjames/Thomson_One_Exchange_List.pdf",
                      pages="all", multiple_tables=True)

Expected behavior:

Get DataFrame / Tables

Actual behavior:

result = {list} []
 __len__ = {int} 0

Related Issues:

tabula-java and tabula-py only supports tables written in text in a PDF. This PDF looks like no text, consists of image.
That's totally out of scope of this project.