This repository contains the data and resources for the SemEval 2024 Task 1: Semantic Textual Relatedness (STR). For more information, please visit the shared task and competition websites.
Dataset | Languages | Shared Task Starter Kit | Citing This Work
The STR dataset is available in the data folder or can be downloaded from Hugging Face (coming soon).
- For Track A: TrackA folder
- For Track B: TrackB folder
- For Track C: TrackC folder
The STR task focuses on the following 14 languages:
- Afrikaans (afr released)
- Algerian Arabic (arq released)
- Amharic (amh released)
- English (eng released)
- Hausa (hau released)
- Indonesian
- Hindi
- Kinyarwanda
- Marathi (mar released)
- Modern Standard Arabic (arb released)
- Moroccan Arabic (ary released)
- Punjabi
- Spanish (esp released)
- Telugu (tel released)
A starter kit is available to help you create a baseline result. You can open the starter kit in a Colab Notebook and run the baseline system. The resultant experiment can be submitted to Codalab to ensure the submission format is clear.
To run the Colab Notebook, click the badge "Open in Colab".
If you use our dataset or participate in the STR task, please cite the following papers:
- STR dataset paper: coming soon
- STR SemEval task description paper: coming soon