Semantic Textual Similarity (STS) is to measure the degree of semantic equivalence between two sentences. Our system do classification and regression for KLUE-STS that is essential to other NLP tasks such as machine translation, summarization, and question answering. We implements the system using LTNtorch.
After cloning this repository, make sure to install all the requirements.
git clone git@github.com:chrisjihee/LTN-STS.git
pip3 install -r requirements.txt
After installation, please check the usage of the main module.
python3 main.py -h
usage: main.py [-h] -t T [-n N] [-m M] [-k K] [-e E] [-lr LR] [-bs BS] [-msl MSL]
optional arguments:
-h, --help show this help message and exit
-t T task name: STS-CLS, STS-REG
-n N gpu id: 0, 1, 2, 3
-m M pretrained model id: 0, 1, 2, 3
-k K number of training samples
-e E number of training epochs
-lr LR learning rate
-bs BS batch size
-msl MSL max sequence length
After checking the usage, please run the main module with some proper options like following:
python3 main.py -t STS-CLS -n 0 -m 2 -k 100 -e 1
Please check the results with following.
- Classification: 0.8437(dev) F1 with KoELECTRA
- Regression: 0.9290(dev) Pearson's r with KoELECTRA
expr1.ipynb
: this notebook contains some experiments using LTN-STS with KoBERT.expr2.ipynb
: this notebook contains some experiments using LTN-STS with KoELECTRA.expr3.ipynb
: this notebook contains some experiments using LTN-STS with KoBigBird.main.py
: this module contains the implementation of LTN-STS.data.py
: this module contains converting original KLUE-STS dataset to each task-specific dataset.data/
: this folder contains the data for our experiments.
LTN-STS has been developed thanks to the following people.