NLP utilities developed at TUW informatics.
Install the tuw-nlp repository from pip:
pip install tuw-nlp
Or install from source:
pip install -e .
On Windows and Mac, you might also need to install Graphviz manually.
You will also need some additional steps to use the library:
Download nltk stopwords:
import nltk
nltk.download('stopwords')
Download stanza models for UD parsing:
import stanza
stanza.download("en")
stanza.download("de")
And then finally download ALTO and tuw_nlp dictionaries:
import tuw_nlp
tuw_nlp.download_alto()
tuw_nlp.download_definitions()
Also please make sure to have JAVA on your system to be able to use the parser!
Then you can parse a sentence as simple as:
from tuw_nlp.grammar.text_to_4lang import TextTo4lang
tfl = TextTo4lang("en", "en_nlp_cache")
fl_graphs = list(tfl("brown dog", depth=1, substitute=False))
# Then the fl_graphs will directly contain a networkx graph object
fl_graphs[0].nodes(data=True)
For more examples you can check the jupyter notebook under notebooks/experiment
We also provide services built on our package. To get to know more visit services.
To run a browser-based demo (also available online) for building graphs from raw texts, first start the graph building service:
python services/text_to_4lang/backend/service.py
Then run the frontend with this command:
streamlit run services/text_to_4lang/frontend/demo.py
In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as expand
, substitute
and append_zero_paths
.
General text processing utilities, contains:
- segmentation: stanza-based processors for word and sentence level segmentation
- patterns: various patterns for text processing tasks
Tools for working with graphs, contains:
- utils: misc utilities for working with graphs
Tools for generating and using grammars, contains:
- alto: tools for interfacing with the alto tool
- irtg: class for representing Interpreted Regular Tree Grammars
- lexicon: Rule lexica for building lexicalized grammars
- ud_fl: grammar-based mapping of Universal Dependencies to 4lang semantic graphs.
- utils: misc utilities for working with grammars
We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at
If you use the library, please cite our paper
@inproceedings{Recski:2021,
title={Explainable Rule Extraction via Semantic Graphs},
author={Recski, Gabor and Lellmann, Bj{\"o}rn and Kovacs, Adam and Hanbury, Allan},
booktitle = {{Proceedings of the Fifth Workshop on Automated Semantic Analysis
of Information in Legal Text (ASAIL 2021)}},
publisher = {{CEUR Workshop Proceedings}},
address = {São Paulo, Brazil},
pages="24--35",
url= "http://ceur-ws.org/Vol-2888/paper3.pdf",
year={2021}
}
MIT license