tilt-nlp

Experimental Natural Language Processing on Privacy Policies for Import to Transparency Information Language

Overview

Elias Grünewald

users have certain rights as transparency information but are not able to conceive them
if data is transferred to multiple parties, the resulting network is not visible
lack of transparency information describing representation format

define transparency representation format
make use of existing corpora
- Privacy Policies e.g. OPP-115 Corpus
- Transparency information key words or categories list of (sensitive) personal data terms such as name, birthday, bank account details, picture, IP address…
- Third parties list of top N companies, institutions
use NLP for semantics extraction (link each transparency key word to third party e.g. by distance)
save n-tuples (incl. purpose, duration) to previously defined representation format
visualize data flow networks

may extend Polisis framework
transparency representation is defined as json example/schema
make use of established NLP framework such as TensorFlow, PyTorch, Google Natural Language API, Amazon Comprehend
common web technologies for visualization