ConVec: Vector Embedding of Wikipedia Concepts and Entities

WikipediaParser folder contains the code to extract and prepare the Wikipedia dump.

Please find in the following, link the pre-traind Concept, Word and Entitie vectors (as a result of this project):

The Wikipedia ID to Title map file (This file maps the Wikipedia ID of a page to its title): https://web.cs.dal.ca/~sherkat/Files/ID_title_map.zip (210MB)
Traind Concepts, Entities and Words: Concepts and entities are presented by their Wikipedia ID.
ConVec https://web.cs.dal.ca/~sherkat/Files/WikipediaClean5Negative300Skip10.zip (3.3GB)
ConVec Fine Tuned: https://dalu-my.sharepoint.com/personal/eh379022_dal_ca/_layouts/15/guestaccess.aspx?docid=0602754033d8a4e65aa8c841aa2efc491&authkey=AZGkUWbSwjiO4SrsPub0LBI (6.86GB)
ConVec Heuristic: https://dalu-my.sharepoint.com/personal/eh379022_dal_ca/_layouts/15/guestaccess.aspx?docid=0b42ef5ba2d3247ccab2c10d5c1691a47&authkey=AYZLBIfjQLta9aAmSaRzymY (3.75GB)
ConVec Only Anchors https://dalu-my.sharepoint.com/personal/eh379022_dal_ca/_layouts/15/guestaccess.aspx?docid=0592fd0392e1f4ec89d84a512e6482d03&authkey=AVpBqVspTRnF0Ra6xedMVIk (3.57GB)

Please cite to the following paper if you used the code, datasets and vector embedings:

@misc{NLDBSherkat2017,
   Author = {Ehsan Sherkat, Evangelos Milios},
   Title  = {Vector Embedding of Wikipedia Concepts and Entities},
   Year   = {2017},
   link   = {https://arxiv.org/abs/1702.03470}
   Url    = {https://doi.org/10.1007/978-3-319-59569-6_50}
   doi    = {10.1007/978-3-319-59569-6_50}
   booktitle= {Natural Language Processing and Information Systems, NLDB 2017}
}

mollenburger/ConVec

ConVec: Vector Embedding of Wikipedia Concepts and Entities