SemEval 2024 Task 1: Semantic Textual Relatedness

This repository contains the data and resources for the SemEval 2024 Task 1: Semantic Textual Relatedness (STR). For more information, please visit the shared task and competition websites.

Languages

The STR task focuses on the following 14 languages:

  1. Afrikaans
  2. Algerian Arabic
  3. Amharic
  4. English
  5. Hausa
  6. Indonesian
  7. Hindi
  8. Kinyarwanda
  9. Marathi
  10. Modern Standard Arabic
  11. Moroccan Arabic
  12. Punjabi
  13. Spanish
  14. Telugu

Dataset

The STR dataset is available in the data folder or can be downloaded from Hugging Face.

Subtasks

  • For Subtask A: Check SubtaskA folder
  • For Subtask B: Check SubtaskB folder

Shared Task Starter Kit

A starter kit is available to help you create a baseline result. You can open the starter kit in a Colab Notebook and run the baseline system. The resultant experiment can be submitted to Codalab to ensure the submission format is clear.

To run the Colab Notebook, fork this repo first and click the badge "Open in Colab" on the forked version.

  • Task A: Open In Colab
  • Task B: Open In Colab

Citing This Work

If you use our dataset or participate in the STR task, please cite the following papers:

  • STR dataset paper: coming soon
  • STR SemEval task description paper: coming soon