DBpedia-Entity is a standard test collection for entity search, which has been first released as DBpedia-Entity v1 [1], and is further updated as DBpedia-Entity v2 [2]. This repository contains the collection, baseline runs, and other details about the DBpedia dump and index.
For detailed information, please check the DBpedia-Entity v2 paper and poster.
The collection consists of a set of heterogeneous entity-bearing queries, categorized into four groups:
SemSearch_ES
: Named entity queries; e.g., "brooklyn bridge" or "08 toyota tundra."INEX-LD
: IR-style keyword queries; e.g., "electronic music genres".QALD2
: Natural language questions; e.g., "Who is the mayor of Berlin?"ListSearch
: Queries that seek a particular list of entities; e.g., "Professional sports teams in Philadelphia".
All queries are prefixed with the name of the originating benchmark. SemSearch_ES
, INEX-LD
, and QALD2
each correspond to a separate category; the rest of the queries belong to the ListSearch
category.
DBpedia-Entity v2 is built based on DBpedia version 2015-10. The collection can be found under collection/v2
and is organized as follows:
queries-v2.txt
: 467 queries, where each line contains a queryID and query text.queries-v2_stopped.txt
: The same queries, with removed stop patterns and punctuation marks.qrels-v2.txt
: Relevance judgments in standard TREC format.folds/
: 5-folds of train-test queries for each query subset, to be used for cross-validation in supervised approaches. If cross-validation is performed for all queries,folds/all_queries.json
should be used.
This repository also contains the DBpedia-Entity v1 collection, which was built based on DBpedia version 3.7. The collection can be found under collection/v1
and is organized similar to the v2 version. There are, however, 3 qrels file for DBpedia-Entity v1:
qrels-v1_37.txt
: The original qrels, based on DBpedia 3.7.qrels-v1_39.txt
: Qrels with updated entity IDs according to DBpedia 3.9.qrels-v1_2015_10.txt
: Qrels with updated entity IDs according to DBpedia 2015-10.
The runs
folder contains all the baseline runs related to this collection in TREC format. The following runs are made available:
/v1
: The runs related to DBpedia-Entity v1, reported in Table 2 of [2]./v2
: The runs related to DBpedia-Entity v2, reported in the following table. These runs are compared with respect to NDCG at ranks 10 and 100. Any new run on DBpedia-Entity v2 is supposed to be compared against these results.
Model | SemSearch ES | INEX-LD | ListSearch | QALD-2 | Total | |||||
---|---|---|---|---|---|---|---|---|---|---|
@10 | @100 | @10 | @100 | @10 | @100 | @10 | @100 | @10 | @100 | |
BM25 | 0.2497 | 0.4110 | 0.1828 | 0.3612 | 0.0627 | 0.3302 | 0.2751 | 0.3366 | 0.2558 | 0.3582 |
PRMS | 0.5340 | 0.6108 | 0.3590 | 0.4295 | 0.3684 | 0.4436 | 0.3151 | 0.4026 | 0.3905 | 0.4688 |
MLM-all | 0.5528 | 0.6247 | 0.3752 | 0.4493 | 0.3712 | 0.4577 | 0.3249 | 0.4208 | 0.4021 | 0.4852 |
LM | 0.5555 | 0.6475 | 0.3999 | 0.4745 | 0.3925 | 0.4723 | 0.3412 | 0.4338 | 0.4182 | 0.5036 |
SDM | 0.5535 | 0.6672 | 0.4030 | 0.4911 | 0.3961 | 0.4900 | 0.3390 | 0.4274 | 0.4185 | 0.5143 |
LM+ELR | 0.5554 | 0.6469 | 0.4040 | 0.4816 | 0.3992 | 0.4845 | 0.3491 | 0.4383 | 0.4230 | 0.5093 |
SDM+ELR | 0.5548 | 0.6680 | 0.4104 | 0.4988 | 0.4123 | 0.4992 | 0.3446 | 0.4363 | 0.4261 | 0.5211 |
MLM-CA | 0.6247 | 0.6854 | 0.4029 | 0.4796 | 0.4021 | 0.4786 | 0.3365 | 0.4301 | 0.4365 | 0.5143 |
BM25-CA | 0.5858 | 0.6883 | 0.4120 | 0.5050 | 0.4220 | 0.5142 | 0.3566 | 0.4426 | 0.4399 | 0.5329 |
FSDM | 0.6521 | 0.7220 | 0.4214 | 0.5043 | 0.4196 | 0.4952 | 0.3401 | 0.4358 | 0.4524 | 0.5342 |
BM25F-CA | 0.6281 | 0.7200 | 0.4394 | 0.5296 | 0.4252 | 0.5106 | 0.3689 | 0.4614 | 0.4605 | 0.5505 |
FSDM+ELR | 0.6563 | 0.7257 | 0.4354 | 0.5134 | 0.4220 | 0.4985 | 0.3468 | 0.4456 | 0.4590 | 0.5408 |
If using this collection in a publication, please cite the following paper:
@inproceedings{Hasibi:2017:DVT, author = {Hasibi, Faegheh and Nikolaev, Fedor and Xiong, Chenyan and Balog, Krisztian and Bratsberg, Svein Erik and Kotov, Alexander and Callan, Jamie}, title = {DBpedia-Entity V2: A Test Collection for Entity Search}, booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval}, series = {SIGIR '17}, year = {2017}, pages = {1265--1268}, doi = {10.1145/3077136.3080751}, publisher = {ACM} }
If possible, please also include the http://tiny.cc/dbpedia-entity URL in your paper, where the data is available for download.
This research was partialy supported by Norwegian Research Council, National Science Foundation (NSF) grant IIS-1422676, Google Faculty Research Award, and Allen Institute for Artificial Intelligence Student Fellowship. We Thank Saeid Balaneshin, Jan R. Benetka, Heng Ding, Dario Garigliotti, Mehedi Hasan, Indira Kurmantayeva, and Shuo Zhang for their help with creating relevance judgements.
In case of questions, feel free to contact f.hasibi@cs.ru.nl or krisztian.balog@uis.no.
[1] K. Balog and R. Neumayer. “A Test Collection for Entity Search in DBpedia”, In proceedings of 436th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR ’13), pages 737-740, 2013.
[2] F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E. Bratsberg, A. Kotov, and J. Callan. “DBpedia-Entity v2: A Test Collection for Entity Search”, In proceedings of 40th ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR ’17), 2017.