Multilingual Semantic Textual Similarity

Semantic textual similarity (STS) is a natural language processing (NLP) task that quantitatively assesses the semantic similarity between two text snippets. STS is a fundamental NLP task for many text-related applications, including text de-duplication, paraphrase detection, semantic searching, and question answering. Measuring STS is a machine learning (ML) problem, where an ML model predicts a value that represents the similarity of the two input texts.

This project catalogues the datasets annotated for semantic textual similarity. Then we perform a comparative evaluation of several deep learning based STS methods in these datasets. This site presents the datasets and pre-trained models evaluated in the Multilingual Semantic Textual Similarity project.

Acknowledgement

We would like to acknowledge following researchers who helped us with the error analysis.

  1. Claudia Schmitt Baeza
  2. Ana Isabel Cespedosa
  3. Maria Ferragud Ferragud
  4. Marie Escribe
  5. Marie Picot
  6. Rocío Caro Quintana
  7. Chrisse Amelie Soukai

Citation

If you use these resources, please cite (and read!) our paper:

Coming soon