README

Requirements

Python 3.6 or above
numpy
scipy
Perl 5 or above (for SemEval evaluation)
sklearn (for short-text classification evaluation)
pytroch version 3.6 do pip install -r requirements.txt to install the requirements.

Execution

Evaluating using Semantic Similarity Benchmarks

To evaluate on semantic similarity benchmarks, go to the src directory and execute

python evaluate.py -m lex -i wordRepsFile -o result.csv

-m option specifies the mode of operation. 'lex' to evaluate on semantic similarity benchmarks. 'ana' to evaluate on word analogy benchmarks. 'rel' to evaluate on relation classification benchmarks. 'txt' to evaluate on short text classification benchmarks. 'psy' to evaluate on pyscolinguistic score prediction benchmarks. 'pos' to evaluate on part-of-speech tagging using CoNLL-2003 dataset. You can combine multiple evaluations using a comma. For example, -m=lex,ana,rel,txt will perform all evaluations in one go.
-d option is used to specify a directory that contains multiple files.
-i specifies the input file from which we will read word representations. The file must be using the gensim format, where the first line contains vocabulary size and dimensionality in integers, separated by a space and remainder of the lines each represents the word vector for a particular word. First element in each line is the word and subsequent elements
-o is the name of the output file into which we will write the Pearson correlation coefficients and their significance values. This is a csv file.
There are several ways to compute the relational similarity between two pairs of words such as CosAdd, CosMult, PairDiff, and CosSub. This tool uses CosAdd as the default method. You can try different methods, which are also implemented in the tool. See source code for more details.

Installation

repseval depends on various packages which could be installed via pip as follows

pip install -r requirements.txt

The following semantic similarity benchmarks are available in this suite.

Dataset	word pairs	Publication/distribution
Word Similarity 353 (WS)	353	Link
Miller-Charles (MC)	28	MILLER, G. A. et CHARLES, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1-28.
Rubenstein-Goodenough (RG)	65	RUBENSTEIN, H. et GOODENOUGH, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10):627-633.
MEN	3000	Link
Stanford Contextual Word Similarity (SCWC)	2003	Link
Rare Words (RW)	2034	Link
SimLex	999	Link
MTURK-771	771	Link

The following word analogy benchmarks are available in this suite.

Dataset	instances	Publication/distribution
SAT	374 questions	Link
SemEval 2012 Task 2	79 paradigms	Link
Google dataset	19558 questions (syntactic + semantic analogies)	Link
MSR dataset	7999 syntactic questions	Link

There are several ways to compute the relational similarity between two pairs of words such as CosAdd, CosMult, PairDiff, and CosSub. This tool uses CosAdd as the default method. You can try different methods, which are also implemented in the tool. See source code for more details.

The following relation classification benchmarks are available in this suite.

Dataset	word pairs	Publication/distribution
DiffVec	12473 pairs	Link

The following short-text classification benchmarks are available in this suite.

Dataset	word pairs	Publication/distribution
TR (Stanford Sentiment Treebank)	train = 6001, test = 1821	Link
MR (Movie Review Dataset)	train =, 8530 test = 2132	Link
CR (Customer Review Dataset)	train = 1196, test = 298	Link
SUBJ (Subjectivity Dataset)	train = 8000, test = 2000	Link

Psycholinguistic score prediction.

We use the input word embeddings in a neural network (containing a single hidden layer of 100 neurons and relu activation) to learn a regression model (no activation in the output layer). We use randomly selected 80% of words from MRC database and ANEW dataset to train a regression model for valence, arousal, dominance, concreteness and imageability. We then measure the Pearson correlation between the predicted ratings and human ratings and report the corresponding correlation coefficients. See Section 4.2 of this paper for further details regarding this setting.

Part of Speech Tagging

pos.py can be used to evaluate pretrained word embeddings for Part-of-Speech (PoS) on the CoNLL-2003 dataset. Specifically, we train an LSTM initialised with pretrained word embeddings, followed by a hidden layer (default to 100 dimensions) and a softmax layer that predicts a word into one of the 47 PoS tags. The LSTM is trained on the standard train split of the CoNLL-2003 dataset and evaluated on the standard test split of the same. Accuracy (fraction of correctly PoS predicted tokens), macro-averaged precision, recall, F scores over the 47 PoS categories are reported as the evaluation metrics.

Bollegala/repseval