/distinctiopus

Natural Language Tool for Distinction Mining in Texts

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Distinctiopus

A tool for mining distinctions based on textlinguistic and rhetoric features of a text.

This means, that this tool, when e.g. reading the Microsoft License Agreement and being informed, that in some cases you have limited warranty and in other cases you have no warranty, it looks for the concret cases, when you have warranty.

To see, how it works

The output are subgraphs, that represent distinctions. To have a picture for better imagination:

Getting Started

First there are some Natural Language Processing tools to install with the AI-models, that belong to them.

Prerequisites

You are going to need:

It is important, that you install the dependencies in a virtualenv, because of an incompatability with the version of spacy. For the preprocessing with the Prepr0cessor, you need spacy==2.0.12, for Distinctiopus you need 'spacy==2.1.0a4 ' (= spacy-nightly), that must be silently installed over spacy 2.0.18. because AllenAI again uses the sentence segmentation of spacy, but checks for the installed version (this works, if you install spacy-nightly after spacy==2.0.18, to override it)

Use the model en_coref_sm in Prepr0cess0r.

For this you can fetch the models with wget from the Distinctiopus's home directory:

mkdir './others_models'
cd ./others_models
wget "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
wget "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

It it used because of wordnet for fetching antonyms and abstractness.

If you didn't install Neo4J with autostart, start it:

 sudo service neo4j start

If you want to mine distinctions of your choice, you have also to preprocess the text, to obtain a folder of conll-files for your text. This is necessary combine the best features of each of these tools, because in other dimensions they may have not so nice results. This is possible with the text-preprocessor tool, I build, watch out here for using this before:

Installing

End with an example of getting some data out of the system or using it for a little demo

Running the tests

Explain how to run the automated tests for this system

Break down into end to end tests

Explain what these tests test and why

Give an example

And coding style tests

Explain what these tests test and why

Give an example

Deployment

To see the docs, go along here:

Built With

Contributing

Versioning

Authors

*** Stefan Werner *** See also the list of contributors who participated in this project.

License

Acknowledgments