/SemanticCorrelation

Playground to provide semantic similarity measures between statistically correlated concepts

Primary LanguagePython

SemanticCorrelation

Playground to provide semantic similarity measures between statistically correlated concepts

What is this?

A script that reads concept descriptions in (Linked Statistical) datasets and outputs the semantic similarity (using LSI) of all possible pairs.

Why?

It belongs to a broader effort to study the relationship between correlation and semantic similarity of datasets.

How to use it?

./semanticCorrelation.py [-e <endpoint> | -i <input.csv>] -o <output.csv> [-v] [-t <numtopics>] [-it <numiterations>]

  • <numtopics> is the number of topics for LSI (default 200)
  • <numiterations> is the number of power iterations for LSI (default 2). With more iterations precision increases, but efficiency decreases

Example

./semanticCorrelation.py -e http://worldbank.270a.info/sparql -o similarities.csv -v -t 300

Dependencies

  • Python 2.7.5
  • NLTK 2.0.4
  • SPARQLWrapper 1.5.2
  • gensim 0.10.0

Disclaimer

Author: Albert Meroño-Peñuela

License: Apache License, Version 2.0