/wikidata-topic-model

Map Wikidata items to a taxonomy of topics from WikiProjects

Primary LanguagePythonMIT LicenseMIT

Generating topic for Wikidata Items that have Wikipedia article (sitelinks)

This is an off-line implementation of the Wikidata Topic Model API. The aim is to compute the topics for all the Wikidata Items that has an article in any Wikipedia.

These notebooks makes usage of the WMF's Hadoop Cluster. If you don't have access to that cluster, you will need to rewrite the code using the Wikidata Dump.

Here we use two tables from the Wikimedia Data Lake:

  • wmf.wikidata_item_page_link: Containig the relation between Wikidata Items and Page Titles. This is results are equivalent to the 'sitelinks' value that you will find in the Wikidata Dump.

  • wmf.wikidata_entity: From we exract the claims for each Wikidata Items. You will find equilivant information in the claims field of Wikidata dump.

This code works is based on the wikidata-topic-model-api. If want to get the topic for sinlge (or small set of) Wikidata Item(s), we recommend you to use this experimental API: https://tools.wmflabs.org/wiki-topic/