read of URL-PDf does not work?!
gittesti opened this issue · 1 comments
gittesti commented
Summary of your issue
Simple PDF read from URL gives EMPTY list ?!
read_pdf("https://training.refinitiv.com/portal/docs/pdf/raymondjames/Thomson_One_Exchange_List.pdf",
pages="all", multiple_tables=True)
Check list before submit
-
[ x] Did you read FAQ?
-
[ x] (Optional, but really helpful) Your PDF URL: ?
-
"https://training.refinitiv.com/portal/docs/pdf/raymondjames/Thomson_One_Exchange_List.pdf",
If not possible to execute tabula.environment_info()
, please answer following questions manually.
- [ x] Python 3.10.6
- [ x] Windows 11 Pro 22000.1042
What did you do when you faced the problem?
Code:
from tabula import read_pdf
df = read_pdf("https://training.refinitiv.com/portal/docs/pdf/raymondjames/Thomson_One_Exchange_List.pdf",
pages="all", multiple_tables=True)
Expected behavior:
Get DataFrame / Tables
Actual behavior:
result = {list} []
__len__ = {int} 0
Related Issues:
chezou commented
tabula-java and tabula-py only supports tables written in text in a PDF. This PDF looks like no text, consists of image.
That's totally out of scope of this project.