nltk/nltk_data

nltk_data compatibility with Windows

benhuff opened this issue · 0 comments

There is an issue that I am running into while downloading nltk_data through conda-forge on Windows. The installation is failing on a file con.xml inside the propbank corpus:

Downloading and Extracting Packages
nltk_data-2019.07.04 | 428.2 MB  | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: failed

CondaVerificationError: The package for nltk_data located at C:\Users\####\.conda\pkgs\nltk_data-2019.07.04-0
appears to be corrupted. The path 'lib/nltk_data/corpora/propbank/frames/con.xml'
specified in the package manifest cannot be found.

I believe this is happening because con is a reserved word on Windows. Unzipping the propbank.zip folder manually using 7zip automatically renames this file to _con.xml.

I was curious if this file could be renamed in this repo:nltk_data/corpora/propbank/frames/_con.xml or if it is preferred to solve this issue conda-forge/nltk_data-feedstock#1 (comment) specifically for the https://github.com/conda-forge/nltk_data-feedstock repository?