SemanticCorrelation

Playground to provide semantic similarity measures between statistically correlated concepts

What is this?

A script that reads concept descriptions in (Linked Statistical) datasets and outputs the semantic similarity (using LSI) of all possible pairs.

It belongs to a broader effort to study the relationship between correlation and semantic similarity of datasets.

./semanticCorrelation.py [-e <endpoint> | -i <input.csv>] -o <output.csv> [-v] [-t <numtopics>] [-it <numiterations>]

<numtopics> is the number of topics for LSI (default 200)
<numiterations> is the number of power iterations for LSI (default 2). With more iterations precision increases, but efficiency decreases

./semanticCorrelation.py -e http://worldbank.270a.info/sparql -o similarities.csv -v -t 300