Author: Bill Thompson (biltho@mpi.nl)
A Simple python script to extract inter-language links from Wikimedia sql dumps. Writes out a csv with (page_id, target_language, page_title_in_target_language) columns
Usage:
python main.py -f avwiki-latest-langlinks.sql.gz
The latest dumps for the English wikipedia, for example, can be found here. This script works on the sql version of the langlinks table dump (e.g.: link). This repository contains an example dump (avwiki-latest-langlinks.sql.gz
) and an example of the parsed result (avwiki-latest-langlinks-parsed.csv
). The latest dumps in other languages can be found at:
dumps.wikimedia.org/LANGUAGEwiki/latest/LANGUAGEwiki-latest-langlinks.sql.gz
where LANGUAGE is replaced by the language iso (e.g. en, ab, es, pt, fr, etc...)