attardi/wikiextractor

Missing file WikiExtractor.py

Closed this issue ยท 7 comments

The file was deleted in one of the recent commits.

Looks like it's in "b13d447 - (8 hours ago) Removed scripts directory. - attardi" but not "6675c69 - (4 hours ago) Created PyPi release. - attardi"
There were some major changes to the repository, and I figured that the structure had changed so that WikiExtractor.py wasn't needed. But I haven't been able to get it to work with the new instructions yet. Anyone else having success?

nope, which is a major bummer cause a lot of people use this

rhn19 commented

Install it using pip & then run the module as a script using "python -m"

The file was deleted in one of the recent commits.

I found a trick to get WikiExtractor.py that :1/ "pip install wikiextractor" and you will see you are able to "from wikiextractor.WikiExtractor import version", which indicating the WikiExtractor.py is existed in your pip list. 2/copy WE.py to whatever file. and uninstall wikiextractor. 3/gitclone or download the latest project (missing WE.py), and move the WE.py you have copied before to the we-master/wikiextractor. It work!

Install it using pip & then run the module as a script using "python -m"

FYI wikiextractor from pip is a bit outdated (updated last year)

I don't have pip and I can't install it in docker.

# apt install python-pip
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package python-pip

Should be fixed now. Thank you.