Issue with downloading inaugural corpus
pratos opened this issue · 4 comments
Hi,
[x] Searched Stackoverflow for any existing issues
[x] Searched nltk_data
open and closed issues
I tried to install inaugural
corpus using python -m nltk.downloader inaugural
. But faced this problem:
[nltk_data] Downloading package inaugural to
[nltk_data] /Users/prthamesh/nltk_data...
[nltk_data] Unzipping corpora/inaugural.zip.
[nltk_data] [Errno 21] Is a directory:
[nltk_data] '/Users/prthamesh/nltk_data/corpora/1789-Washington.tx
[nltk_data] t'
Error installing package. Retry? [n/y/e]
y
[nltk_data] Downloading package inaugural to
[nltk_data] /Users/prthamesh/nltk_data...
[nltk_data] Unzipping corpora/inaugural.zip.
[nltk_data] [Errno 21] Is a directory:
[nltk_data] '/Users/prthamesh/nltk_data/corpora/1789-Washington.tx
[nltk_data] t'
Error installing package. Retry? [n/y/e]
y
[nltk_data] Downloading package inaugural to
[nltk_data] /Users/prthamesh/nltk_data...
[nltk_data] Unzipping corpora/inaugural.zip.
[nltk_data] [Errno 21] Is a directory:
[nltk_data] '/Users/prthamesh/nltk_data/corpora/1789-Washington.tx
[nltk_data] t'
Error installing package. Retry? [n/y/e]
This was tested on Mac M1 (2021 edition) and also on Ubuntu 20.04 (Github CI runner). Faced the above issue on both the OS.
@pratos Hey! I investigated this a little bit. We updated our inaugural corpus about 8 hours ago.
The changes were of a slightly different format than before, but I don't have issues on Windows.
However, on Google Colab I do get these issues. They were (at least partially) resolved by updating nltk
: pip install -U nltk
.
Perhaps this would work in your case. Let us know.
@stevenbird @nimbusaeta The recent changes to inaugural have some changes which might also be related:
2021-Biden.txt
has Windows line endings, while all other files have Unix line endings.- The zip directly contains the .txt files, while previously the .zip contained a folder containing the .txt files.
If simply updating nltk
doesn't help, then we might want to revert back (assuming the old version did work!).
Hey thanks for the update, will check out if bumping nltk version works for my local.
For our application though, we are being cautious not to break things. We resorted to removing inuagural
from the list of corporas since we don't use it specifically now (just a bloat).
I can confirm that bumping nltk
to 3.6.5
works on Mac M1
Closing this issue since this would affect folks only on the previous versions. We have nltk==3.2.4
for our legacy app. Incase if anyone gets this issue, just upgrade the nltk
version