heroku/heroku-buildpack-python

Cannot import NLTK stopwords through Jupyterub

gowrishec opened this issue · 3 comments

I am using below code to use stopwords through jupyter hub, I have hosted jupyter hub on AWS DLAMI Linux server.

$python3 -m nltk.downloader stopwords
$python3 -m nltk.downloader words
$python3 -m nltk.downloader punkt

python3

from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))
print(stop_words)

This works fine while running in python terminal.

But when I try below in Jupyternotebook its failing with error. Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

When i try to download in python3 terminal I see its already upto date.

>>> import nltk
 >>> nltk.download('stopwords') [nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data]   Package stopwords is already up-to-date! ](url)

But when tried through jupyter hub download is timing out. Ideally download is not required if its up to date. So is there is configuration in jupyter hub to handle this.

Hi! This doesn't seem like an issue with the buildpack, rather an issue with jupyter hub - and so not a Heroku issue?

The NLTK feature of this buildpack is also something that if we were starting from scratch we likely wouldn't add native support for, given that it could be run via the bin/post_compile scripts. As such, it's more likely we'd remove the built-in support than try and add more advanced support for it.

I faced the same issues and I think the reason is that stopwords identifier is showing as out of date in the nlkt downloader.

image

Closing since this appears to be an upstream issue.