EMNLP2024 Code Repository

Welcome to the official code repository for our EMNLP2024 submission: "Automated Tone Transcription and Clustering with Tone2Vec."

This repository contains all the experimental details and code discussed in our paper.

1. Repository Contents

Code_emnlp2024.ipynb: Jupyter notebook containing all experimental code, conducted in Google Colab.
emnlp_weights/: Directory containing pre-calculated Tone2Vec representations.

The notebook is structured to align with the paper's sections and tables:

Tone2Vec (Section 5): Focus on pitch-based similarity tone representation, dialect clustering, and variance analysis.
Tone Transcription (Section 6.2): Details methods and results for tone transcription accuracy across different conditions (Tables 2 and 3).
Tone Clustering (Section 7.2): Results in Tables 4, 5, and Figure 6, exploring tone clustering to identify data patterns and groupings.
Additional Experiments: Conducted during the rebuttal stage to address reviewer feedback and validate findings.

Please Note: This is an anonymous version intended for review purposes only.

The official package will be released upon acceptance of the paper. Please do not distribute this version.