Map Wikidata items to a taxonomy of topics from WikiProjects. This approach represents a Wikipedia article based on the claims contained in its Wikidata item. The topics are determined based on the WikiProject directory. Currently this repository just contains a Flask app that provides predictions based on a pre-trained model but eventually it will be expanded to include the entire training process.
cd app
python3 app.py
NOTE: you must have the fastText Python module installed. See https://fasttext.cc/docs/en/support.html for how to install.
After starting the app as described above, queries can be made via the browser. For example, for Toni Morrison:
http://127.0.0.1:5000/api/v1/wikidata/topic?qid=Q72334
The threshold above which a topic is returned [0-1] can be set via the threshold
parameter but otherwise defaults to 0.5
:
http://127.0.0.1:5000/api/v1/wikidata/topic?qid=Q72334&threshold=0.1
Append the debug
parameter for additional output including all of the topics and scores and the Wikidata claims processed by the model:
http://127.0.0.1:5000/api/v1/wikidata/topic?qid=Q72334&debug
To get a sense of why the model is making the predictions it is, you can enable explanations for each prediction. The explanations are made via LIME (https://github.com/marcotcr/lime) and indicate the best guess around which Wikidata properties / values were most influential in making the prediction for that label. It can slow down the processing, so they are off by default. To turn them on, simply set the PROVIDE_EXPLANATIONS
variable in app.py
to True
and restart the app (Ctrl+C
and then rerun python3 app.py
).
This script takes in a file with JSON objects containing the wikidata IDs to query (and any additional metadata) and outputs these JSONs with the predicted labels. Example input / output data is provided in the bulk/data
folder.
cd bulk
python3 wikidata_ids_to_topics.py --help
python3 wikidata_ids_to_topics.py
NOTE: like the app, you must have the fastText Python module installed. See https://fasttext.cc/docs/en/support.html for how to install.