Python coherence evaluation tool using Stanford's CoreNLP.
This repository is designed for entity-based coherence.
It is highly recommended to run a CoreNLP server on your own if you want to test coherence in this repository.
You can download Stanford CoreNLP latest version (3.9.2) at here and run a local server (requiring Java 1.8+) by this way:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
Then there comes a demo at localhost:9000
, which visualizes StanfordCoreNLP's sophisticated annotations for English documents.
Also, there is an online demo maintained by Stanford at here.
If you need to annotate lots of documents, you must set up a local server on your own. Or if you just want to test a few documents without downloading the CoreNLP tool, you may set an environment variable CORENLP_URL
to use an existing server (e.g. http://corenlp.run/
and don't forget the http
).
Also, if you are using Windows (actually, it is recommended to install pre-built binaries instead of building them by yourself whatever OS you choose), make sure you have installed any Python's scientific distribution such as Anaconda (if you want many scientific packages for future use) or Miniconda (if you don't want to spend too much disk space) which I strongly recommend.
The requirements are nltk
, numpy
, pandas
, requests
, scipy
and scikit-learn
.
If you have installed Anaconda or Miniconda just
conda create -n coheoka --file requirements.txt
and activate it by typing activate coheoka
on Windows or source activate coheoka
on Linux.
Check out conda documentation for more details.
-
Barzilay, R., & Lapata, M. (2008). Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1), 1-34.
-
Lapata, M., & Barzilay, R. (2005, July). Automatic evaluation of text coherence: Models and representations. In IJCAI (Vol. 5, pp. 1085-1090).