SamEdwardes/spacypdfreader

Can't install in colab enviroment

Closed this issue · 5 comments

Can anyone help me with this? I'm trying to install spacypdfreader using google colab, and it returns the following error message:
Error message

I used this last week and it was working, now i don't know how to proceed.
ps: I already installed spacy package

It is strange that it worked last week but not now.

What version of python are you using in Google Colab? Is the notebook public, can you share a link?

I think I see the issue:

https://github.com/SamEdwardes/spaCyPDFreader/blob/ce083bc5b61b06084c818b7d243f3c9210274442/pyproject.toml#L11-L15

The requirements of spacypdfreader are python = "^3.9". Google Colab is on python 3.7:

image

I think spacypdfreader should be able to work on python 3.7. I will update the requirements for ^3.7 and check if it works.

I think I see the issue:

https://github.com/SamEdwardes/spaCyPDFreader/blob/ce083bc5b61b06084c818b7d243f3c9210274442/pyproject.toml#L11-L15

The requirements of spacypdfreader are python = "^3.9". Google Colab is on python 3.7:

image

I think spacypdfreader should be able to work on python 3.7. I will update the requirements for ^3.7 and check if it works.

Maybe I was mistaken about running at google colab, and I just runned at my laptop. Probably you're right. And it would be great if this problem can be fixed using python 3.7. I will follow possible updates, thanks for replying me.

I just closed a PR (#2) that should fix the issue. It now works for me on Google colab. You can try this:

!python --version
!pip install spacypdfreader
!python -m spacy download "en_core_web_sm"

import requests

import spacy
from spacypdfreader import pdf_reader

# Download a PDF.
url = "https://github.com/SamEdwardes/spaCyPDFreader/raw/main/tests/data/test_pdf_01.pdf"
response = requests.get(url)
with open('test.pdf', 'wb') as f:
    f.write(response.content)

nlp = spacy.load("en_core_web_sm")
doc = pdf_reader("test.pdf", nlp)
print(doc)

I just closed a PR (#2) that should fix the issue. It now works for me on Google colab. You can try this:

!python --version
!pip install spacypdfreader
!python -m spacy download "en_core_web_sm"

import requests

import spacy
from spacypdfreader import pdf_reader

# Download a PDF.
url = "https://github.com/SamEdwardes/spaCyPDFreader/raw/main/tests/data/test_pdf_01.pdf"
response = requests.get(url)
with open('test.pdf', 'wb') as f:
    f.write(response.content)

nlp = spacy.load("en_core_web_sm")
doc = pdf_reader("test.pdf", nlp)
print(doc)

Thank you! Now it's working fine.