/tuw-nlp

NLP @ TU Wien

Primary LanguagePythonMIT LicenseMIT

TUW-NLP

NLP utilities developed at TUW informatics.

Install and Quick Start

Install the tuw-nlp repository from pip:

pip install tuw-nlp

Or install from source:

pip install -e .

On Windows and Mac, you might also need to install Graphviz manually.

You will also need some additional steps to use the library:

Download nltk stopwords:

import nltk
nltk.download('stopwords')

Download stanza models for UD parsing:

import stanza

stanza.download("en")
stanza.download("de")

And then finally download ALTO and tuw_nlp dictionaries:

import tuw_nlp

tuw_nlp.download_alto()
tuw_nlp.download_definitions()

Also please make sure to have JAVA on your system to be able to use the parser!

Then you can parse a sentence as simple as:

from tuw_nlp.grammar.text_to_4lang import TextTo4lang

tfl = TextTo4lang("en", "en_nlp_cache")

fl_graphs = list(tfl("brown dog", depth=1, substitute=False))

# Then the fl_graphs will directly contain a networkx graph object
fl_graphs[0].nodes(data=True)

For more examples you can check the jupyter notebook under notebooks/experiment

Services

We also provide services built on our package. To get to know more visit services.

Text_to_4lang service

To run a browser-based demo (also available online) for building graphs from raw texts, first start the graph building service:

python services/text_to_4lang/backend/service.py

Then run the frontend with this command:

streamlit run services/text_to_4lang/frontend/demo.py

In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as expand, substitute and append_zero_paths.

Modules

text

General text processing utilities, contains:

  • segmentation: stanza-based processors for word and sentence level segmentation
  • patterns: various patterns for text processing tasks

graph

Tools for working with graphs, contains:

  • utils: misc utilities for working with graphs

grammar

Tools for generating and using grammars, contains:

  • alto: tools for interfacing with the alto tool
  • irtg: class for representing Interpreted Regular Tree Grammars
  • lexicon: Rule lexica for building lexicalized grammars
  • ud_fl: grammar-based mapping of Universal Dependencies to 4lang semantic graphs.
  • utils: misc utilities for working with grammars

Contributing

We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at

Citing

If you use the library, please cite our paper

@inproceedings{Recski:2021,
  title={Explainable Rule Extraction via Semantic Graphs},
  author={Recski, Gabor and Lellmann, Bj{\"o}rn and Kovacs, Adam and Hanbury, Allan},
  booktitle = {{Proceedings of the Fifth Workshop on Automated Semantic Analysis
of Information in Legal Text (ASAIL 2021)}},
  publisher = {{CEUR Workshop Proceedings}},
  address = {São Paulo, Brazil},
  pages="24--35",
  url= "http://ceur-ws.org/Vol-2888/paper3.pdf",
  year={2021}
}

License

MIT license