Critical Information Extraction from Terms of Services Document

Terms of Services (ToS) are legal agreements between users and service providers. In order for the user to consume any service they must accept the terms. However, since ToS documents are very verbose and use a very opaque jargon, users tend to acknowledge them without fully understanding the agreement. This can lead to the user signing obligations which they might not be willing to in reality, or might be exposed to unfair terms and practices. The proposed idea is to make user more informed about the unfairness of the clauses in ToS and also present the obligations imposed by it.

The contributions of this project to the earlier research are:

An extensive comparison of Transformer based embeddings (RoBERTa and XLNet) with various deep learning models.
Considering and identifying user obligated clauses as critical information in addition to unfair clauses.

Project Demo: link

Dataset

ToS dataset created as a part of CLAUDETTE experimental study.

Experiments and Source Code

Topic	File location in Repository
Fairness Classification	src
Obligation Detection	Obligation_Detection
GRU with RoBERTa Embeddings Model Weights	model
BERT Double	fairness_classification/bert_double
Legal BERT	fairness_classification/legal_bert
Custom Legal BERT	fairness_classification/custom-legal-bert
SVM Models	fairness_classification/SVM
Embeddings Generation	fairness_classification/input_feature_generation
RNN Based Models	fairness_classification/rnn_models

Execution

Steps to execute

# install all necessary packages
pip install -r requirements.txt

# execute the fairness classification code
python3 src/main.py ./../examples/9gag.txt # sample clauses files can be found in src/examples

# execute the obligation detection code
python3 Obligation_Detection/Obligations_v2.py input.txt

References

When does pretraining help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset - Github Code

CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

A machine learning-based approach to identify unlawful practices in online terms of service: analysis, implementation and evaluation

XLNet: Generalized autoregressive pretraining for language understanding

Named Entity Recognition on legal text for secondary dataset

The cost of reading privacy policies

Contributors - Group 18

Aditya Ashok Dave (daveadit@usc.edu)
Akanksha Sanjay Nogaja (nogaja@usc.edu)
Lavina Lavakumar Agarwal (llagarwa@usc.edu)
Shreya Venkatesh Prabhu (prabhus@usc.edu)
Sai Sree Yoshitha Akunuri (akunuri@usc.edu)

daveaditya/fairness_classification_obligation_detection_in_terms_of_services