/wikikit

Convert Wikipedia XML dumps to JSON

Primary LanguageGoMIT LicenseMIT

wikikit

Convert Wikipedia XML dump into JSON.

$ wikikit WIKIPEDIA-XML-DUMP

Gobuild download

Output:

{
   "redirect" : {
      "title" : ""
   },
   "text" : "{{Red ....",
   "ctitle" : "anarchism",
   "title" : "Anarchism"
}

Extract category information only:

$ wikikit -c "Kategorie" WIKIPEDIA-XML-DUMP

Extract authority data only:

$ wikikit -a "Authority control" WIKIPEDIA-XML-DUMP

De-literalize JSON text from wikidata pages/articles dumps:

$ wikikit -d WIKIDATA-XML-DUMP