
Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text.

Primary LanguagePythonMIT LicenseMIT

Semantic Textual Similarity Toolkits


This is the code by ECNU team submitted to SemEval STS Task.



# download the repo
git clone https://github.com/rgtjf/Semantic-Texual-Similarity-Toolkits.git
# download the dataset and stanford CoreNLP tools
sh download.sh
# run the demo
python demo.py


you can configure sts_model.py to see the performance of different features on STSBenchmark dataset.


Methods Dev Test
RF 0.8333 0.7993
GB 0.8356 0.8022
EN-seven 0.8466 0.8100
---------------------- -------- --------
aligner 0.6991 0.6379
idf_aligner 0.7969 0.7622
BOWFeature-True 0.7584 0.6472
BOWFeature-False 0.7788 0.6874
nGramOverlapFeature 0.7817 0.7453
BOWFeature 0.7639 0.6847
AlignmentFeature 0.8163 0.7748
WordEmbeddingFeature 0.8011 0.7128


STSBenchmark board


Any questions, please feel free to contact us: rgtjf1 AT 163 DOT com


If you find this responsity helpful, please cite our paper.

    title = "{ECNU} at {S}em{E}val-2017 Task 1: Leverage Kernel-based Traditional {NLP} features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity",
    author = "Tian, Junfeng  and
      Zhou, Zhiheng  and
      Lan, Man  and
      Wu, Yuanbin",
    booktitle = "Proceedings of the 11th International Workshop on Semantic Evaluation ({S}em{E}val-2017)",
    year = "2017",
    url = "https://aclanthology.org/S17-2028",
    pages = "191--197"