How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Code and data for our CoNLL 2020 publication: "How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation"
@inproceedings{eger-etal-2020-probe,
title = "How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation",
author = "Eger, Steffen and
Daxenberger, Johannes and
Gurevych, Iryna",
booktitle = "Proceedings of the 24th Conference on Computational Natural Language Learning",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.conll-1.8",
pages = "108--118",
}
Our implementation is based on Senteval to train and evaluate classifiers, using a given sentence embedding. The following features were added:
- Change the size of a dataset, while maintaining its balance
- Change the balance between classes in a given dataset
- Use the Random Forest and Naive Bayes classifiers from scikit-learn
- Automatically tune hyperparameters for MLP and Random Forest
- Train various sentence embeddings
We also added english datasets, as well as datasets in the languages:
- Turkish (tr)
- Russian (ru)
- Georgian (ka)
The following probing and downstream tasks were added to SentEval:
Task | Type | Description | Example | Command Line Argument |
---|---|---|---|---|
Voice | Probing | Whether sent. contains a passive construct | He likes cats ⟶ False | Voice |
Subject Verb Agreement | Probing | Whether subject and verb agree | They works together ⟶ Disagree | SubjVerbAgreement |
Subject Verb Distance | Probing | Distance between subject and verb | The delivery was very late ⟶ 1 | SubjVerbDistance |
Argumentation Mining | Downstream | Whether sent. supports or opposes a given topic | (abortion, Abortion is basically murder!) ⟶ opposing | MArgMin |
Sentiment Analysis | Downstream | Positive, negative or neutral sentiment | Never fails to disappoint ⟶ NEG | MSenti |
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Our pretrained multilingual Infersent, Quickthought and RandomLSTM embeddings (checkpoints) can be downloaded from here
and placed in sentence_embeddings/embedder_data
.
To download further embeddings:
cd sentence-embeddings
./download_requirements.sh
cd senteval/data/downstream
./get_transfer_data.bash
The purpose of __main__.py
is to generate sentence embeddings and automatically execute our modified version of SentEval on them.
To run an experiment, specify a list of sentence embeddings and tasks, an output file and a classifier. You can specify additional parameters to modify the dataset before running experiments.
Example: python . -s avg -t WordContent -f results.json --mlp
The parameter s
specifies the sentence embedding, t
a list of tasks and f
the result file. mlp
sets the classifier to Multilayer Perceptron. For an exhaustive list of command line parameters, run python . --help
.
If the result file already exists and contains valid json, the new results will be merged with the existing results. The results are written in the following json format:
{
'Experiment 1 parameters': {
'Sentence embedding 1': {
'Task 1': {
...
},
'Task 2': {
...
}
}
}
}
The file log.txt contains all parameters, results and the logs from senteval.
The following table explains the values that we set the command line parameters to, when we executed our experiments. Afterwards we will give their individual shell commands.
Parameter | Values | Meaning |
---|---|---|
sentence_embeddings | avg; pmean; randomLSTM; infersent; quickthought; LASER; averageMultilingualBERT | Average Pooling; Power Means; Random LSTM; Infersent; QuickThoughts; LASER; mBERT with average pooling |
ntrain | 0.1; 0.5; 1 100000; 30000; 20000; 10000; 5000; 2000 |
10%; 50%; 100% of the training data 100000; 30000; 20000; 10000; 5000; 2000 samples |
lang | en; ru; tr; ka | English; Russian; Turkish; Georgian |
balance | 1 5; 1 10 | 1:5; 1:10 relation between the class sizes |
export embeddings=avg pmean randomLSTM infersent quickthought LASER averageMultilingualBERT
English Probing For each $classifier
in mlp, log_reg, random_forest, naive_bayes
and for each $ntrain
in 100000, 30000, 20000, 10000, 5000, 2000
:
python . --lang en -s $embeddings --$classifier --ntrain $ntrain -t Length WordContent Depth TopConstituents BigramShift SubjNumber SubjVerbAgreement SubjVerbDistance Voice -f results.json
Multilingual Probing For each $lang
in tr, ru, ka
(please note that not all probing tasks are avilable for all languages):
python . --lang $lang --log_reg --ntrain 10000 -t Length WordContent Depth TopConstituents BigramShift SubjNumber SubjVerbAgreement SubjVerbDistance Voice -f results.json
Multilingual Downstream For each $lang
in tr, ru, ka
:
python . --log_reg --lang $lang -t MArgMin MSenti MTREC
python . --log_reg --lang en -t MArgMin MSenti TREC