Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets

This is the code in Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets which has been accepted by ACL 2019.

Folders

quantify contains codes for generating weights and codes for Section 2.1 Quantifying the Biasedness in Datasets in which we explore the severity of the leakage in six NLSM datasest.

debias contains codes for Section 5 Experimental Results for the Leakage-neutral Method on QuoraQP where we apply our leakage-neutral learning in QuoraQP with a classical Siamese-LSTM model.

Usage and requirements are stated inside folders.

Datasets

We use following six datasets in our paper:

Weights

weights.npy is the weights for QuoraQP used in our paper. Weights for Train/Dev/Test sets are concatenated together. We recommend to use the QuoraQP released in QuoraQP, since we notice that there are several versions of QuoraQP which are not exactly the same.

Citation

If you use the code, please cite following paper,

@inproceedings{zhang2019selection,
  title={Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets},
  author={Zhang, Guanhua and Bai, Bing and Liang, Jian and Bai, Kun and Chang, Shiyu and Yu, Mo and Zhu, Conghui and Zhao, Tiejun},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  pages={4418--4429},
  year={2019}
}

zhangxu90s/Leakage-Neutral-Learning-for-QuoraQP

Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets

Folders

Datasets

Weights

Citation