This repo contains source code for "A Customized Text Sanitization Mechanism with Differential Privacy" (accepted to the Findings of ACL 2023 )
The MedSTS dataset is from the paper: MedSTS: A Resource for Clinical Semantic Textual Similarity https://arxiv.org/pdf/1808.09397.pdf
This is a medical dataset and cannot be directly downloaded. You can contact the first author of the paper for the access to the dataset.
python main.py
--dataset sst2
--eps 1.0
--top_k 20
--embedding_type ct_vectors
python main.py
--dataset sst2
--eps 1.0
--top_k 20
--embedding_type ct_vectors
--save_stop_words True