mediawiki-parser
This is a parser and some utilities for the MediaWiki markup language.
At the moment the primary interface is the mediawiki-links
utility which
accepts a MediaWiki XML dump file on stdin
and writes a set of link graph
edges to stdout
. The output format is a tab-separated text file with the
following columns,
- source node name
- link target node name
- link target namespace
- link anchor text
Installation
- Download and install Haskell Platform
- Run
cabal update
- Clone this repository,
git clone git://github.com/bgamari/mediawiki-parser
- Run
cabal install
from within the repository - The
mediawiki-import
executable can be found in~/.cabal/bin