SamEdwardes/spacypdfreader

cannot import pdf_reader

Closed this issue · 2 comments

Following the instructions for spacypdfreader
import spacy from spacypdfreader import pdf_reader
I get the following error message:
Traceback (most recent call last): File "/Users/my_name/apprendre-dev/pdfreader/spacy.py", line 1, in <module> import spacy File "/Users/my_name//apprendre-dev/pdfreader/spacy.py", line 2, in <module> from spacypdfreader import pdf_reader ImportError: cannot import name 'pdf_reader' from 'spacypdfreader' (/Users/my_name//apprendre-dev/pdfreader/venv/lib/python3.10/site-packages/spacypdfreader/__init__.py)
I get exactly the same result on colab and conda... is there a change in the packages which have not been reported in the user guide ?

Thank you for creating this issue. It looks to me like the issue is that you have two import statements in one line.

Can you please try this:

import spacy

from spacypdfreader import pdf_reader

nlp = spacy.load("en_core_web_sm")
doc = pdf_reader("tests/data/test_pdf_01.pdf", nlp)

# Get the page number of any token.
print(doc[0]._.page_number)  # 1
print(doc[-1]._.page_number)  # 4

# Get page meta data about the PDF document.
print(doc._.pdf_file_name)  # "tests/data/test_pdf_01.pdf"
print(doc._.page_range)  # (1, 4)
print(doc._.first_page)  # 1
print(doc._.last_page)  # 4

# Get all of the text from a specific PDF page.
print(doc._.page(4))  # "able to display the destination page (unless..."

If you continue to have issues, please also share:

  • The operating system and version you are using
  • Python version
  • Package versions
python --version
pip freeze

Problem finally solved.
I created a new venv.
python --version Python 3.9.13

I use VS-Code on MacOS Ventura 13.3.1
I had to install a bunch of additional packages each time an error was thrown. You'll find them in the attached requirements.txt
requirements.txt
In addition to this I had to slightly adapt this code using maybe a recent change on Spacy page related to Spacypdfreader project.
from spacypdfreader.spacypdfreader import pdf_reader

Then everything worked.