/Gem-based-entity-knowledge-maintenance

Implementation of Taneva, B., & Weikum, G. (2013, October). Gem-based entity-knowledge maintenance. CIKM 2013

Primary LanguagePython

Gem-based-entity-knowledge-maintenance

Implementation of Taneva, B., & Weikum, G. (2013, October). Gem-based entity-knowledge maintenance. CIKM 2013

Usage:
-> Go to ./Code
-> python Main.py -s -i [-h] [-m] [-o ] [--alpha ] [--context <left|center|right>]

s: Compulsory parameter
< seedFile > : Path of seed file. If there are m inputs, there will be 2*m lines. For each input, first line contains the seed text, which is a list of space separated words. Second line contains budget, which is a positive integer

i: Compulsory parameter
< inputFile > : Path of data file. Contains the repository of text

m: To ensure diversity among text portions (gems), MMR (Maximal Marginal Relevance) based idea is used. Firstly, gems are extracted for twice the required budget. Thereafter, relevance score of a gem is made equal to the Jaccard similarity of gem with the seed text. Overlap among gems is also calculated based on Jaccard similarity. Finally, MMR is employed to select a subset of extracted gems.

alpha: Higher alpha translates to preference of selection of contiguos words - which may be desirable to capture novel information related to seed text between two sections highly related to seed text as per standard text similarity measures.

context: It is eitone of following: 'left', 'center', or 'right'. 'Left' means context of a word is determined by a set of words to the left of the word in question in sentences. Analogous meaning for 'center' and 'right'

python Main.py -i ../../data.txt -s ../seed.txt -m