/toolkit-python

a python library for wikipedia information retrieval and extraction + digital humanities computing

Primary LanguagePythonMIT LicenseMIT

WeKeyPedia python toolkit Build Status Coverage Status

installation

using virtualenv

The pypi distribution is updated on important releases. During the development phase, this is approximatively every week.

$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install wekeypedia
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger

using development version

If you need to get a up-to-last-second-update version, you might want to use the github master version. This is highly unstable. You both get work in progress features, their bugs and their bugfixes in realtime.

$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install https://github.com/wekeypedia/toolkit-python/archive/master.zip
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger

usage

get the current content of a page

import wekeypedia

p = wekeypedia.WikipediaPage("Pi")
content = p.get_revision()

print content

parse diff result

diff = p.get_diff()
plusminus = p.extract_plusminus(diff)

p.print_plusminus_overview(plusminus)

count stems of a page

print p.count_stems([ content ])

examples and macros

You can explore the different current usages of the library by getting a look at the current we are using to build various datasets.

using virtualenv

$ virtualenv e/py --no-site-packages
$ source e/py/bin/activate
(py)$ pip install -r requirements.txt