Similarity and relatedness datasets for Wikipedia entities (WikiSRS). These datasets were developed and published in the following paper:
- D Newman-Griffis, A M Lai, and E Fosler-Lussier, "Jointly Embedding Entities and Text with Distant Supervision". In Proceedings of the 3rd Workshop on Representation Learning for NLP (Repl4NLP), 2018.
The dataset is provided in the dataset
folder as two CSVs, where each line gives two Wikipedia page IDs, their surface forms, the mean similarity/relatedness scores assigned by MTurkers, the standard deviation of scores, and the raw scores themselves.
The MTurk interface we used for our HITs is in the interface
folder, or you can see live demos here:
We have also provided the original CSV files we used in MTurk for generating our HITs, in the mturk_datafiles
folder.
If you use this dataset in your own work, please cite the paper above:
@inproceedings{Newman-Griffis2018Repl4NLP,
author = {Newman-Griffis, Denis and Lai, Albert M. and Fosler-Lussier, Eric},
title = {Jointly Embedding Entities and Text with Distant Supervision},
booktitle = {Proceedings of the 3rd Workshop on Representation Learning for NLP (Repl4NLP)},
year = {2018}
}