This is a set of scripts used for:
- one-time import of descriptions from the Polish Wikipedia index of biographies to Wikidata (https://pl.wikipedia.org/wiki/Kategoria:Noty_biograficzne)
- generation and management of a new version of said index using aggregated data from Wikipedia and Wikidata (https://pl.wikipedia.org/wiki/Wikipedia:Indeks_biografii)
Written in Ruby and JavaScript.
The MIT License, partially dual-licesed under CC BY-SA to allow certain files to be freely pasted on pages of Wikimedia projects.
For details and list of contributors see LICENSE.
A whole lot. Apart from the standard Ruby library some of the scripts require the following gems (in latest available versions as of 2013-09-27):
roman
json
nokogiri
parallel
sunflower
unicode_utils
unidecoder
The code has only been tested on Ruby 1.9.3. It will probably run on newer Rubies, too.
Most of the text (in Polish) and configuration (for the Polish Wikipedia) is hardcoded in the .rb and .js files. Sorry 'bout that.
Brief description of each file:
-
Wikipedia gadget
- bioindex-editor.css and bioindex-editor.js – a gadget that allows editors to modify the Wikidata descriptions and Wikipedia defaultsorts straight from the index itself.
- bioindex-editor-bootstrap.js – minimal loader for the gadget, to be added to common.js.
-
Primary scripts
- build-index.rb – aggregate data from all sources and upload them to the index. Takes a few hours to run; generates temporary 'savepoints' which will be used as starting point (this allows it to be terminated at will without losing all the work).
- parse-index.rb – parse old index of biographies and dump the data in JSON format to current directory.
- upload-index.rb – upload the data generated by the above script to Wikidata.
- sprzeczne.rb – compare birth and death year data aggregated from categories and from the old index of biographies, return a pretty table.
-
Mini-libraries
- intro-extractor.rb – extracts brief descriptions and lifetime information from given Wikipedia pages.
- roman.rb – wrapper for the
roman
gem to fix its broken handling for negative numbers (used to deal with centuries BC). - savepoint.rb – short wrapper for
Marshal.load
and.dump
from/to file.
-
Miscellanea
- .gitignore – contains a list of temporary files running the Ruby scripts might generate.
- LICENSE – MIT / CC BY-SA.
- README.md – this file.