This is the code in Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets which has been accepted by ACL 2019.
quantify contains codes for generating weights and codes for Section 2.1 Quantifying the Biasedness in Datasets in which we explore the severity of the leakage in six NLSM datasest.
debias contains codes for Section 5 Experimental Results for the Leakage-neutral Method on QuoraQP where we apply our leakage-neutral learning in QuoraQP with a classical Siamese-LSTM model.
Usage and requirements are stated inside folders.
We use following six datasets in our paper:
weights.npy is the weights for QuoraQP used in our paper. Weights for Train/Dev/Test sets are concatenated together. We recommend to use the QuoraQP released in QuoraQP, since we notice that there are several versions of QuoraQP which are not exactly the same.
If you use the code, please cite following paper,
@inproceedings{zhang2019selection,
title={Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets},
author={Zhang, Guanhua and Bai, Bing and Liang, Jian and Bai, Kun and Chang, Shiyu and Yu, Mo and Zhu, Conghui and Zhao, Tiejun},
booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
pages={4418--4429},
year={2019}
}