Playground to provide semantic similarity measures between statistically correlated concepts
A script that reads concept descriptions in (Linked Statistical) datasets and outputs the semantic similarity (using LSI) of all possible pairs.
It belongs to a broader effort to study the relationship between correlation and semantic similarity of datasets.
./semanticCorrelation.py [-e <endpoint> | -i <input.csv>] -o <output.csv> [-v] [-t <numtopics>] [-it <numiterations>]
<numtopics>
is the number of topics for LSI (default 200)<numiterations>
is the number of power iterations for LSI (default 2). With more iterations precision increases, but efficiency decreases
./semanticCorrelation.py -e http://worldbank.270a.info/sparql -o similarities.csv -v -t 300
- Python 2.7.5
- NLTK 2.0.4
- SPARQLWrapper 1.5.2
- gensim 0.10.0
Author: Albert Meroño-Peñuela
License: Apache License, Version 2.0