Read Me

Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity

Our paper is the first to propose and substantiate the theoretical performance upper bound of contrastive learning methods. Additionally, Pcc-tuning is the inaugural method capable of achieving Spearman’s correlation scores above 87 on standard STS tasks, marking a significant advancement in the field.

This paper has been accepted to EMNLP 2024. (Main)

Results

Data

Stage one: nli_for_simcse.csv
Stage two: merged-SICK-STS-B-train.jsonl
Link: https://drive.google.com/drive/folders/1M6zXUQ-XCe7bgYD6th6T-5mCInpnUwbV?usp=sharing

Checkpoints

Link: https://drive.google.com/drive/folders/1jTtORca1ihHlpdFfSpwsSgmcL6KrukjM?usp=sharing

Quick Start

Python Version: 3.9.18
Install Dependencies
```
cd code
pip install -r requirements.txt
```

Download SentEval

cd SentEval/data/downstream/
bash download_dataset.sh

Stage One

cd code
nohup torchrun --nproc_per_node=4 train.py > nohup.out & # 4090 * 4

Stage Two

cd code
nohup torchrun --nproc_per_node=4 tune.py > nohup.out & # 4090 * 4

Acknowledgement

Our code is based on PromptEOL

Related Work

Github: STS-Regression

Paper: Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

Conference: 🌟 EMNLP 2024, Main
Github: CoT-BERT

Paper: CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

Conference: 🌟 ICANN 2024, Oral
Github: PretCoTandKE

Paper: Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

Conference: 🌟 ICIC 2024, Oral