Used for WikiQA's MongoDB database (Wikidata) filtering
npm
nodejs
npm packages:
#wikidata-filter@3.0.2 it was buggy on simplifying claims and requires some dirty hacks, while the one below is less.
wikibase-dump-filter@5.0.5
load-balance-lines@1.0.5
conda create -n wikidata-filter nodejs
conda activate wikidata-filter
npm install -g wikidata-filter@3.0.2
npm install -g load-balance-lines
wikidata-filter refs (This package has been renamed to wikibase-dump-filter by its author maxlath):
- https://github.com/maxlath/wikibase-dump-filter
- https://github.com/maxlath/wikibase-dump-filter/blob/master/docs/cli.md
- https://www.npmjs.com/package/wikidata-filter
- https://www.npmjs.com/package/wikibase-dump-filter
load-balance-lines:
Entities:
wikidata-20191118-all.json
847Gsh get_wikidata_dump.sh
wikidata_entities_en_zh.json
andwikidata_entities_zh_tw_cn_sim_richvalues_types.json
70Gsh run-wikidata-filter.sh
wikidata_properties.json
andwikidata_properties_sim.json
sh run-wikidata-filter-properties.sh
Note that the above scripts take quite a long time to finish.
wikidata_entities_zh_tw_cn_sim_richvalues_types.json
wikidata_properties_sim.json