Official code release of our work on Intent Classification and Slot Filling for Privacy Policies.
- We propose a new dataset called PolicyIE in this work.
- PolicyIE provides annotations of privacy practices and text spans for sentences in policy documents.
- We refer to predicting privacy practice as intent classification and identifying the text spans as slot filling.
[NOTE] The PolicyIE dataset is available here.
- python>=3.6
- torch==1.5.1
- transformers==3.0.2
- fairseq==0.9.0
- seqeval==1.2.0
- pytorch-crf==0.7.2
cd data
bash prepare.sh
We studied the following two alternative modeling approaches as baselines in our work. We refer the readers to the paper for more details about the models and experiment results.
# Input
[CLS] We may also use or display your username and icon or profile photo on marketing purpose or press releases .
# Type-I slot tagging output
Data-Collection-Usage B-DC.FPE O O B-Action O O B-DP.U B-DC.UOAP O B-DC.UOAP I-DC.UOAP I-DC.UOAP I-DC.UOAP O O O O O O O
# Type-II slot tagging output
Data-Collection-Usage O O O O O O O O O O O O O O B-P.AM I-P.AM I-P.AM I-P.AM I-P.AM O
- [Models] BiLSTM, Transformer, BERT, RoBERTa
- Implementations are available at https://github.com/wasiahmad/PolicyIE/tree/main/seqtag.
- Go to the
seqtag
directory and use therun.sh
script for model training and evaluation. Runbash run.sh -h
to learn about the command line arguments.
# Input
We may also use or display your username and icon or profile photo on marketing purpose or press releases .
# Output
[IN:Data-Collection-Usage [SL:DC.FPE We] [SL:Action use] [SL:DP.U your] [SL:DC.UOAP username] [SL:DC.UOAP icon or profile photo] [SL:P.AM marketing purpose or press releases]]
- [Models] UniLM, UniLMv2, MiniLM, BART
- Implementations are available at https://github.com/wasiahmad/PolicyIE/tree/main/{bart,mass,unilm}.
- Go to the corresponding model directory and use the
prepare.sh
script to prepare data andrun.sh
script for model training and evaluation. Runbash run.sh -h
to learn about the command line arguments.
We acknowledge the efforts of the authors of the following repositories.
- https://github.com/monologg/JointBERT
- https://github.com/microsoft/unilm
- https://github.com/microsoft/MASS
- https://github.com/pytorch/fairseq/tree/master/examples/bart
@inproceedings{ahmad-etal-2021-intent,
title = "Intent Classification and Slot Filling for Privacy Policies",
author = "Ahmad, Wasi and
Chi, Jianfeng and
Le, Tu and
Norton, Thomas and
Tian, Yuan and
Chang, Kai-Wei",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.340",
doi = "10.18653/v1/2021.acl-long.340",
pages = "4402--4417",
}